DuckDB's Wild Ride in the Open Source World ft. Co-creator Mark Raasveldt
2023/09/08Featuring: ,Mehdi interviews Mark Raasveldt, CTO of DuckDB Labs and co-creator of DuckDB just before DuckCon in San Francisco. He aimed to gain some insight into the behind-the-scenes workings of the open-source project. Enjoy the conversation!
Transcript
0:00foreign
0:05Mark it's super nice to have you uh for a small chat uh we are everything is prepared for the.com right yeah how are you feeling uh amazing what can I say we're here in San Francisco we're here uh we just had a fantastic keynote by Hannah's yeah at the data and AI Summit and now we're doing duckcon in front of
0:23a a whole new crowd like halfway that's true the world it's uh yeah it's not the same audience that you used to it's the first time in the US right at the first time we're doing duckcon yeah yeah yeah first time I mean we have done two duck cons three dot-cons actually one online one in Brussels and now one in San
0:42Francisco yeah or originally our plan was to do one per year yeah but somehow we have already this is like our second fence so uh what can I say it's just a lot of fun so your CTO at wdb lab yeah you're a co-creator of Seoul um uhb what's your relationship to the products today uh compared to harness yeah I
1:04think it's um I still do a lot of technical work like I myself still kind of push a lot of like programming stuff out there honest does as well but less So like um in the end I as CTO I do a lot of the technical management of people like I guide them on uh with the problems that they encounter code base I
1:26review their pull requests I look at what they're working on um and guide them some guide them in what they should be working on as well and that does kind of require me to have quite in-depth knowledge of essentially the whole code base right so and in order to maintain that I also have to keep programming it's kind of the way I
1:45feel as well as just what I enjoy right like we started this because we enjoyed it so I still do a lot of programming Honda still does programming as well but I would say like in terms of sheer numbers let's say I do like three days a week and honestly was like one day a week yeah it's like um yeah the division
2:01is a little bit um and there is more people right there's many more how many uh so we're now around 15 people yeah and uh that's definitely also has changed my role significantly like I used to just be five days a week programming or six days depending on how I feel I have meeting so the meanings we we don't really have
2:23many scheduled meetings but we have like people come and ask me questions right and of course um I'm happy to answer them like uh often I'm the only one that's able to answer them and it's in a sense very nice because it actually means we get a lot more done right like as a team we can focus on different things different
2:41people can do different things um and as a team we accomplish much more than what just me and honest could accomplish right so it is a big productivity booster to have all these people but for me personally of course it takes time away from programming yeah which is actually fine like I do enjoy guiding and mentoring people as well
2:59um but it's definitely a change of scenery from where we were when we started ducted nice uh what's your biggest challenge at the moment regarding the open source management
3:10um I think at the moment everything is going quite smooth we have had different
3:16challenges with uh outside contributors of course um where mostly the the thing is like we allow outside contributions and I think that's important because I want people from the community to feel like they can contribute and leave their impression on the product as well but of course there is a Time investment from us from our side that comes to vetting the code
3:39making sure it all works and making sure it all uh well essentially we have to gatekeep the code so that the other users don't run into issues by like a few people that do contribute and submit pull requests so we have had some uh some issues where we had for where we had for example refuse certain changes
3:58because there were just too large for an outset contributor and we didn't have the time to uh to like look at a 200 file changed PR for example like it would take us longer to look at the pr and vet make sure everything works then it would if um if we just made it ourselves and also with outside
4:16contributors they can kind of just throw this over the fence and it becomes our problem right so we have to make sure the code base is like in a good State always uh the outside contributors they don't have any of this obligation so there's like some inherent um responsibilities that get shifted when someone opens a pull request and I
4:35wouldn't say it's uh they have their credibility I you know they have I could leave it yeah yeah so I've been maintaining it they're they're people that have done some contributions that um were definitely quite valuable but at the end we had to either rewrite or rework them in quite significant ways uh to the point where probably it costs us more time than if
4:57we had just done it ourselves um it's always something to balance with outside contributors like it is it's nice but it's also a risk let's say yeah and um what what will be the best advice you give an outside contributor to maximize the success to have espr's future ID merge yeah so I think the first thing that we
5:20advise people to do is to talk to us yeah like to actually uh ask the Lord yeah Discord uh so open a discussion on
5:30GitHub and have a chat because we have seen I think the your PR is most likely to lead to get rejected if you just open a big PR yeah without discussing those first like if you if you come to us and you say hey I want to implement this feature or I won't say I have this in
5:46the project can I work on it and we say yes then we're of course much less likely to say later okay uh we don't want this we may still say that it's not a sufficient quality or may give like pointers as in uh you should improve the test coverage should add more tests you should rewrite this piece
6:04of code but in principle if we say you're allowed so like we would like to see you work on this yeah then that's all right we have had some people that unfortunately have submitted and don't like a lot of work on something that they thought was important but that in the end we could not merge because it
6:21was just a giant pull request for a feature that we didn't really want yeah right so discussing before it's it's actually kind of the same mindset for any software engineering team right yes you instead of diving straight to or if you're not sure even if you have requirement and you have a ticket right if you're not sure about about it you
6:40can always double check exactly to avoid to have a long and painful discussion in the pr no for sure and I think one thing that's quite nice about where we're going right now is that we have this extension system now yeah and that I think takes away a lot of these things because people can write their own
6:57extension or they can contribute to an external extension and that we don't even have to interface with that right like they can make their own extension that can do the feature that they desire and they can work on it completely separate from us we don't need to see the code we don't need to interact with the code and then they can
7:13have the joy of like doing whatever they want and but we won't be stuck like holding the the bag so to speak yeah I'd like to have to maintain that piece of software it is then also fully their responsibility yeah no that's great um cool what's the biggest feature your most bullish on techdb that's been already or already been developed I
7:36think uh I mean there has been very many I think the the feature that um has been the most that I have personally developed in the last few months that has seen the most praise has to be the pivot and pivot I think that was something that actually I decided to work on because I was looking at the
7:54GitHub issues and I saw that that issue had like the most uh upvotes from people so I figured well people must like this and then I started working on it immediately tons of people were very happy they all responded people were testing it it was something that's immediately saw a lot of success and there was a yeah I was very happy with
8:12that of course like to make something that's truly useful to a lot of people cool can you give us like just a short explanation of the feature for people who are yeah yeah sure so uh pivoting and unpivoting um it's like it's like if you know pivots tables in Excel it's based on that essentially it means turning
8:30columns into rows or rows into columns and that's kind of a cool feature because in SQL you can normally not query columns in the same way that you can query rows right like if you have rows you can filter on them you can group on them you can do all these operations because that's the the way the language is designed by being able
8:49to turn columns into rows you can then solid these operations on columns and that's super helpful because a lot of data sets actually encodes data points as columns right like you can imagine you have a data set that's like oh you have the population per year and then you have the columns are like population like 2000 2001 2002 2003 and
9:08that is actually a data point right and then if you want to do a um a reshaping of that data to uh do like a grouping over like essentially like pivoting yeah what's called to do grouping over the columns instead then you need this pivot yeah so that's it less less code for forget the same function yeah like doing this stuff you
9:29can do it yourself in SQL as well but it's just way more work and especially it's work that scales linearly with the amount of columns like every single column you'd have you need to add something so being able to do this like in an easier way is very valuable cool last question if you had Android developers for a month
9:49that can do anything for you on the dagdb product what would you do that's a very very nice question so 100 developers I think you always have this issue when you have a lot of developers that they kind of butt heads and like a conflict so they had 100 developers I put them on probably 100 different things that's a
10:09lot of things to name right now but I think something that I have personally wanted to do but haven't really had time for is more things like tooling around duck to be like for example having a SQL formatter based on Duck to be or like a better syntax highlighter or like a better autocomplete stuff like that which I think is something that's it's a
10:27lot of independent parts that each of these events should work on it could all be extensions exactly they don't need to communicate for that and it's something that's yeah definitely I think that's something that is could be super useful to people and also like I would I would like to use myself in a lot of situations so that's also yeah often
10:46what inspires me to do things is like what would be useful to me yeah cool thank you very much for your time Mark I think uh I'll let you you know relax a bit before that they come and everybody is showing up uh and I'll see you soon
Related Videos
2025-10-31
Lies, Damn Lies, and Benchmarks
Why do database benchmarks so often mislead? MotherDuck CEO Jordan Tigani discusses the pitfalls of performance benchmarking, lessons from BigQuery, and why your own workload is the only benchmark that truly matters.
Stream
Interview

60:00
2025-10-23
Can DuckDB replace your data stack?
MotherDuck co-founder Ryan Boyd joins the Super Data Brothers show to talk about all things DuckDB, MotherDuck, AI agents/LLMs, hypertenancy and more.
YouTube
BI & Visualization
AI, ML and LLMs
Interview

59:07
2024-10-24
The Death of Big Data and Why It’s Time To Think Small | Jordan Tigani, CEO, MotherDuck
A founding engineer on Google BigQuery and now at the helm of MotherDuck, Jordan Tigani challenges the decade-long dominance of Big Data and introduces a compelling alternative that could change how companies handle data.
YouTube
Interview

