Friendly SQL with DuckDB

2024/01/05Featuring: ,

For decades, SQL has been the undisputed language of data. Its power and ubiquity are undeniable, yet so are its frustrations. Standard SQL can be verbose and rigid, often requiring repetitive boilerplate code, nested subqueries, and tedious workarounds for common analytical tasks. This increases the risk of errors, slows down exploratory analysis, and can feel counter-intuitive for modern data practitioners accustomed to the flexibility of data frame libraries.

What if SQL could be more intuitive, less error-prone, and more productive? This question is at the heart of DuckDB's approach to the language. DuckDB is pioneering practical, common-sense syntax improvements that address these long-standing pain points, making SQL more powerful and approachable. This article explores these features, drawing insights from a conversation with Alex, a forward deployed software engineer at MotherDuck, who uses DuckDB's modern syntax to solve complex data challenges daily.

The features discussed are demonstrated in the accompanying video using a Google Colab notebook. Analysts can achieve this setup seamlessly by setting a MotherDuck service token and using the md: connection string, which automatically installs and configures the necessary DuckDB extension.

Accelerate Exploration with a Flexible Query Structure

A typical data analysis workflow begins with a simple question: what does the data look like? Standard SQL forces analysts to start with a SELECT clause, even when the first logical step is to identify the source table. DuckDB improves this workflow by allowing for FROM-first queries.

This syntax lets an analyst start by specifying the table, which is often the first thing they want to do. The video demonstrates this by running a query that begins with FROM sample_data.hacker_news.posts followed by a LIMIT clause to immediately inspect the table's contents. This small change aligns the query structure with the natural thought process of exploration.

Building on this concept, DuckDB also supports a more logical ordering of clauses. Instead of the standard SELECT ... FROM ... WHERE order, users can write queries in the order of execution: FROM ... WHERE ... GROUP BY ... SELECT. This not only improves human readability but also enhances the performance of AI-powered code completion tools. As Alex notes, providing the table context upfront gives AI assistants a much easier job of suggesting relevant columns and functions, leading to a faster and more accurate coding experience.

Eliminate Boilerplate with Smarter Aggregations

Just as DuckDB rethinks the overall query structure, it also streamlines one of the most common and repetitive parts of any analysis: aggregations. One of the most common sources of friction in analytical SQL is the GROUP BY clause. Standard SQL requires users to manually list every non-aggregated column from the SELECT statement, a repetitive task that is prone to error, especially in queries with many grouping dimensions.

DuckDB solves this with GROUP BY ALL. This simple command instructs the engine to automatically group by all non-aggregated columns present in the SELECT list. The video illustrates this with a query that counts Hacker News posts mentioning "DuckDB" by year and month. Instead of writing GROUP BY year, month, the query simply uses GROUP BY ALL, and DuckDB correctly infers the grouping columns. This eliminates redundant code and reduces the chance of syntax errors.

Similarly, DuckDB offers ORDER BY ALL, which sorts the results by every column in the SELECT list in ascending order. This mirrors the intuitive sorting of a pivot table in Excel, where data is automatically ordered column-by-column. This feature is a convenient alternative to listing columns by their numeric position (e.g., ORDER BY 1, 2, 3), a practice that can easily break when columns are added or removed during query development.

Simplify Complex Logic with Reusable Aliases

Beyond simplifying aggregations, DuckDB also addresses a common source of complexity: the rigid rules around column aliases. Standard SQL imposes a frustrating limitation on column aliases: an alias defined in a SELECT statement cannot be used in a subsequent WHERE, GROUP BY, or HAVING clause within the same query level. This forces analysts to either repeat the entire expression or wrap the logic in a subquery or a Common Table Expression (CTE), adding unnecessary complexity.

DuckDB removes this barrier by allowing aliases to be reused in subsequent clauses. If an analyst creates an alias like EXTRACT(YEAR FROM post_date) AS year, that year alias can be used directly in the WHERE and GROUP BY clauses. DuckDB intelligently resolves the alias as long as there is no ambiguity with an existing column name. This results in cleaner, more readable, and more maintainable queries.

The flexibility extends even further with sequential aliasing within the SELECT clause itself. Users can define an alias and immediately reference it to build a chained calculation in the next column. This allows for complex, multi-step logic to be expressed clearly in a single SELECT statement without the need for nested functions or subqueries.

Select Columns Dynamically with Data Frame-Style Syntax

While reusable aliases help manage complexity within a query's logic, another challenge arises when dealing with the structure of the data itself, especially in wide tables. Working with tables containing hundreds of columns often requires manually listing each one to select, exclude, or transform a subset. DuckDB brings data frame-style flexibility to this problem with the powerful COLUMNS function.

The COLUMNS function allows for dynamic column selection using expressions. Key capabilities include:

  • Excluding columns: COLUMNS(* EXCLUDE (col_a, col_b)) selects all columns except for the ones specified.
  • Replacing columns: COLUMNS(* REPLACE (col_a * 2 AS col_a)) allows for in-place modification of a column's value.
  • Pattern matching: Users can select columns based on patterns using regular expressions or lambda functions. The video demonstrates selecting all columns that start with "pop" from a dataset, a task that would be cumbersome in standard SQL.

This functionality dramatically simplifies data preparation and feature engineering workflows that were previously the exclusive domain of programmatic tools like pandas.

Improve Daily Workflows with Quality-of-Life Enhancements

Beyond these major structural improvements, DuckDB includes several smaller features that enhance the daily user experience.

  • Function Chaining: To avoid deeply nested function calls, DuckDB supports dot notation. A series of string manipulations can be written as string.upper().split() instead of split(upper(string)), which improves readability by presenting operations in a clear, sequential order.
  • Trailing Commas: DuckDB resolves the long-standing debate over leading versus trailing commas by supporting trailing commas. This makes it easier to reorder or comment out lines in your SELECT statement without causing syntax errors.
  • String Slicing: For users familiar with Python, DuckDB provides convenient bracket notation for string slicing. This offers a familiar and concise method for text manipulation to those coming from programming backgrounds.

The Future of SQL is Friendlier

The SQL enhancements in DuckDB are more than just syntactic sugar. They represent fundamental improvements that reduce cognitive load, prevent common errors, and make data analysis more productive and enjoyable. By listening to the needs of its users, DuckDB is actively evolving SQL to meet the demands of modern data work.

Features like GROUP BY ALL and the COLUMNS function demonstrate a commitment to developer experience, proving that a powerful analytical engine can also be user-friendly. As Alex concluded in the conversation, this is a community-driven effort. Users are encouraged to share their own SQL frustrations and ideas on the DuckDB GitHub repository, helping to collectively build a better, more intuitive future for the language of data.

0:00hello welcome everybody uh for our first

0:05uh quacken code uh live stream something new that we are trying and the goal of

0:12uh the session is obviously to quack and

0:17to on a specific technical topic and then to code a little bit to have something uh pragmatic and it's just before the Christmas holidays so do

0:30pretty light uh today and talk a bit about SQL U and how the new features

0:37that Doug Deb bring to uh SQL also so if you're new to uh Doug Deb it's also a good session for you uh we'll be uh sharing directly uh some code that you can follow along uh during the live stream uh so please uh let us know uh where you are from if you're joining us live uh I'm based in Berlin so it's pretty

1:04late already 9 uh but let me know if you're taking breakfast or lunch I mean

1:12you don't NE necessarily need to have a specific time zone but usually we take the breakfast in the morning and the lunch in the noon um so yeah Greece oh

1:25so I see we have Belgium uh Finland do

1:30we have someone from Berlin so nice to see a lot of right I'm

1:38Bas Boulder yeah right uh so uh during the session I

1:45always uh invite some uh guests and would it be external guests or internal guest and today uh we I have an internal guest so someone also uh from mother deck uh Alex uh Alex welcome how you

2:01doing hiy everybody doing great doing great uh so Alex what EXA can you do a bit a quick intro like what actually do you do at mother deck oh sure so I'm a

2:15forward deployed software engineer at mother deck which is a madeup title it's uh kind of ridiculous but what does it mean it's a little bit of everything it's uh I'm kind of the the first person to talk with our customers in a lot of cases so first time we're meeting them learning about their requirements their use cases and kind brainstorming

2:30dreaming with them but then I'm also kind of the first line of defense on the Tactical troubleshooting in terms of Maybe here's a code snippet to solve what you need maybe there's um an integration that you need some help with you know working with a different ecosystem tool um or you know uh bug hunting on our on our own platform so a

2:47lot of fun yeah that's uh that's a lot so it's a bit more advanced than sales it's something between sales engineer and a solution architect but you're still doing sometimes the integration yourself it's it's definitely a mix and that's I think part of it is you kind of get to choose what you like and I happen to

3:05like the ecosystem side so I kind of picked up a few extra tasks there but you know the core of it is yeah the learning what the customers need helping make sure we build what they need and then helping them tactically as they uh as they use my and what's your initial story with uh with Doug DB oh sure so um my background

3:25is industrial and systems engineering and I spent nine years at Intel and while I was there was basically a part of uh I'll call it Shadow it so we complied with all the it regulations but we did all the stuff that we did all the stuff that it didn't want to build you know like the kind of

3:40intermediate solution the prototyping stuff that it's not the three-year plan it's the you know one week plan and as a part of that we built a self-service analytics platform and it was designed to connect all the hundreds of on premise databases that Intel still runs on in all their factories um and the best way to combine all that data

3:59together um ended up being ducky B so I found it way back in 2020 and I hammered it with a million crazy SQL queries and it didn't make a sweat you know no problem and then um it has since gone on a hockey stick curve and I've been along for the ride um so found it just with

4:15the Google search and then just started using it really liked it tweeted about it a lot and um then I got tied in yeah you do like twer right I I seen that I do I enjoy it it's it's fun it's I feel like I don't know I I wish it was man a little differently but I I like the

4:31people I get to talk with on there cool um and um yeah that's like the shadow it thing I think we could talk for for an hour about this but I'm really I'm really curious do you think there like are there other companies that like have dedicate team come like that you know it's we never called ourselves that I kind of learned that's

4:55what we were after I left so uh we weren't exactly up front about where the shadow it team but uh yeah makes sense

5:04right I've learned since then that there are a lot of places where you need a spectrum of solutions you you know if you're running you know your your customer orders you know the system that takes an order and submits an order is going to be built by your corporate it and it's got to work 99.99% of the time

5:19uh your reporting tools um frequently are built by other teams it can be a real spectrum of people building it themselves people having a team that uh uses commercial software or maybe internal platform team that builds an internal platform and we were kind of that that internal platform team um model yeah now but that that makes sense

5:39I mean I've seen like some them kind of like with strapping thing especially where like with large corporate company right you need to kind of take some shortcuts but okay you were not like advertised as a shadow shadow team you know and we follow the rules we we had a couple people that used to work in it on

5:57our team so we we had a couple ringers where we we you know we weren't your your typical Shadow it hacking stuff together we were I think building something pretty neat no that's great yeah I mean I guess um and so we have um

6:13I have a small uh squl quiz and Alex is

6:17not freaking freaking out because he he doesn't know any of those answer but it's also for you uh for the audience for a people live stream um it's I think there is interesting things to know uh about SQL at least I'll learn it when I when I did the quiz um so first do you know about which year

6:37when SQL was developed oh I should know better than I do um I'm go with roughly I don't know how about 77 1977 oh that's that's really close yeah 1974 yeah okay so 1974 and if you want to try by the way I I have multiple question around SQL so if you want to test your knowledge please don't use chat DPT we all know

7:08the answer at the end right just to test your culture around SQL um so yeah uh

7:15sea was developed at first in 1974 and do you know which popular song was released that year I'm GNA give you a hint uh

7:28Sweden H Swedish band I do not oh oh the most popular song in that year oh man well I mean ABBA is a pretty good hint uh they're from the Swedish band yeah we're gonna go close what what what do I know from ABBA songs I only know a few so we'll go with Dancing Queen by Abba oh no that's W water water was was

7:55releas so now whenever you're going to hear that song you're going to say that's the year sequel was released I'm not sure it's going to be a bang at you know a cocktail party but you know that's uh that's how to remember anyway uh and so do you know where would the L actually sequel oh I think I think I might be a

8:17big enough data personer to know that so what was this was this was this IBM system R doing this yeah yeah exactly yeah so IBM and the first database uh

8:29that they were using was uh cstr and do you know the first name of SQL so that's a first a fun one so before it was named squel before it was sequel at one point it was the word squel spelled out like s e q e l my

8:48God you're you're you're a Myster book I thought I could trapped you so yeah SQL so SQL like spelled

9:00spelled out as you said like SQL movie right um so and it was standing for

9:07Structured English uh query language and

9:11then they shorted out uh to SQL

9:16SQL and U do you know what's the the name of the standard version of SQL and who is maintaining it um I know it's ansy SQL ANSI does the

9:30like every every three years or so they they put one out um but I have no idea what an stands for yeah uh NC is a private nonprofit

9:40organization um and you have also ISO so

9:44which is uh with they they basically work together for for those standard so regarding this standard do you know when was the last one released the SQL standard they did one in 23 I think it's got the new um graph syntax and stuff yeah yeah and so actually before that uh

10:08I mean there was one in 19 as I can see but mostly like the last last big one was in uh 2016 so between 2016 and this year we don't have that much so it's pretty crazy right because if you think in term of uh of word uh

10:30timeline right changed a lot that's a lot yeah so just to give uh you an idea

10:37uh snowflake was created in was publicly launched in 2014 so yeah they basically been you know living on certain standard for for quite some time and that explained why also we have quite some Divergence maybe with the different uh data warehouse because the current stand there hasn't moved a lot what do you think about that yeah my my quip about this is that

11:05it's it stands for structured query language not standard query language we definitely can't get everyone to agree and uh it's kind of been that way all along um it's I think improving over time people are starting to realize that this is this is not super fun there's some new tools coming out to translate back and forth but yeah it's

11:25tough yeah and uh so so if people want read uh about I just put the in uh in the comment

11:37section uh the blog about uh the new uh

11:43SQL standard for this here and as you mentioned there is some stuff U around Jon which is interesting too right that's true I think they did finally formalize some some of the Json stuff yeah yeah and um it's funny because I guess like a lot of companies saw the needs to be able to have more power in

12:08the SQL and you know for people ingesting you know un structur it or gent kind of data into their relational database uh but it's also interesting to see how you know does it how much it lag with like you know standards coming true right because um a lot of database have been supporting G for a couple of years

12:30um so that's uh that's pretty neat yeah think the way that the standard folks do it is they wait for a couple people to build it and then they see how similar they are and then they kind of build a standard after the fact a little bit so it's it doesn't really prevent the fragmentation but at least you kind of

12:47learn what's the best way so once it's finally standardized it's really good but it does mean that people do their own thing for a little while yeah yeah yeah indeed um and so

12:58coming back to my quiz what else do I have yeah I think uh I think we cover um a bit that around the intro uh I hope uh everybody learned something new apart from uh waterl from ABA right uh but yeah um moving on um let's

13:20talk a bit about uh tooling and and set up uh around uh tdbn

13:27SQL so what's what's eily you're you're go to uh fing when you use uh

13:34dgb yeah when I'm when I'm implementing like a system with duct Tob I love the the python client so that's my my go-to if I want to have like a long running thing or or a really complicated workflow that's my favorite a lot of what I'm doing these days when I'm helping folks out is trying to do the

13:48simplest thing possible so these days I use the CLI a lot because that's kind of the most kind of core part of duct TB is the command line tool so I usually run that either just I'll run it in a basic terminal but I'll also run it in vs code so I can have a a SQL file where I where

14:04I edit all my commands and then I can just kick it off with a oneliner in the CLI so at least I get some you know both of those at least I get some syntax highlighting but I uh yeah I that's what I use what uh when did you start to use like the um the vs code thing because I I I

14:24kind of like discovered this workflow when I started with uh to work with uh with duck DB in February but I think you you were using it already before oh maybe not probably probably later than that I think I um I was not a huge terminal user uh you know my previous company I was a Windows person so you

14:43know command prompt is not as nice as the terminal so I used it when when it was required uh but uh nowadays I'm on

14:51I'm on Mac uh and so getting more more and more used to that you know vs code I have used for a couple years but I hadn't done as much SQL in there I was mostly just using it for yeah python stuff JavaScript stuff yeah and let me share uh to the audience like uh quickly what we are talking about with uh BS

15:09code uh this something we we show in one or uh video but um you see my screen

15:15right because I don't see you anymore yep yep um so uh what we talk about is

15:22that basically when you it it can work with any terminal or uh any editor but the goal is basically you have SQL file and then you open the WB CLI down here

15:35and when it's there then you can basically just uh L what what's cool with that is that you have formatting and syntax Al light and also G up copilot I mean here I don't need my kid up copilot because I have my human copilot right I like it there uh the goal is that you basically have all the

15:54to complet all the feature of of the of the terminal and what I do you see I send commands uh down and it's just a shortcut so you can setting uh do custom

16:08shortcuts in uh PS code and then you can just uh send it out and here as you see I'm just reading a public CSV and that's it I can I can work like that so I I do like also work like that when it's a something that's want to easily repeat uh you know a SQL file because I found

16:29like we're g to talk about notebook but notebook has their Pro and cons for Learning and sharing but yeah I do like pure SQL and it's also embraced the simple setup of like you know um Doug DB uh Doug DB itself right so yeah and I

16:47didn't even have that advance of a setup for me I just I run one command of like read than the name of my SQL file and it'll just kick them in one shot but uh but I'll be using uh can yeah you can select you you can select multi line and so on everything uh works yeah yeah

17:06Works uh pretty great um yeah so um yeah

17:10so that's that's the one possibility we have also people using D Behavior I I think you you're using a bit D Behavior or not yes yep I like d Beaver for if I'm doing a lot of work with like tables and I really want to do you know creating tables removing them and all that it's much nicer to have that

17:28exploratory pain on the left hand side um I also do some work with postgress and dctb can read postgress so it's nice to have like a a postgress client right next to my my dctb so I do use D Beaver for that it's kind of my um if if I'm doing more with with physical tables I'll I'll use DB

17:47yeah oh that's cool uh and then we have other stuff like Harin har I'm not sure how to pronounce it uh you know the terminal editor for ddb um probably

17:59should cover that in a specific video or live stream we're pretty cool so and then we have a notebooks uh which works also you know uh great and today we're gonna use to talk more about the specific SQL features of ddb uh Google collab uh simply because it's super easy to share and to set up you basically

18:24don't need uh uh anything uh have you been using for are there use case done sharing and teaching a little bit um I I work part-time for dub Labs as well in addition to mother duck so I do um documentation for them and collab is a nice way to have um you know live documentation where you'll actually

18:44write it out in in a markdown file but then hey by the way if you want to go run it here's a collab link so that's yeah probably how I got started with it um we didn't use it internally in until um there were you know Microsoft shop so so no no Google collab there but uh I've

18:58been enjoying it on the open source side it's just great to share that link just hey by the way check it out yeah yeah no definitely and uh so

19:07I'm G to share uh with people folks the Google uh collab if you want to join uh directly and then I'm going to share uh my screen and we're going to dive into that so Alex you wrote

19:25actually two blog posts around uh SQL

19:29feature from dougb right on the yeah dougb website um and um and I've seen so

19:38I went to those two blog and I've seen a lot of things that I wasn't aware I wasn't not like using it so it's uh really interesting we're GNA see that basically there is so first there is stuff that is was happening that I didn't realize there were happening because they were not basically not part of a standard right um and then uh there

20:05is part uh there is just function that I

20:10was not aware uh but it's kind of like also challenging to kind of incorporate that in your cache no this one um sure yes yes those those function uh do do you have like in highlight like what's your your favorite function uh first going before going diving into the The Notebook specifically oh my my favorite um kind of duck Tob specific

20:35sequel thing is a um a group by all so instead of having to group by individual columns you can Group by just have it just tell it handle my group by automatically so nowadays I don't even worry about my group ey it's it's really nice that's yeah and so I think it's like it's really interesting because there is

20:54uh you could say for example someone starting there is a lot of people starting their SQL Journey with uh dgdb now you know even at University and so on because super easy to install um but

21:07what is fun is that you could say someone that learns you know SQL with Doug DB oh you know of course there is Gile that's totally normal you where you

21:18get where I'm going it's like no this it's like there there is actually stuff um done for you which is great um but I think we we should really recognize the um the A4 that is sp there because yeah that me just seems uh so so natural yeah

21:36that one in particular there is at least one other dat database company that has picked that up and implemented it uh they're named after a type of precipitation um but uh they uh they they have group a as well because we want to make it a standard so you know you can't necessarily make it a standard with just one database doing it but if

21:54there's a couple I think we can get there yeah yeah indeed um

22:01so we going to start uh basically to just install a couple of package um can you tell me the difference between the dark DP inine I'm I'm going to play dma so that physically everything is clear as for the audience oh sure oh yeah actually yeah can you tell me the difference between the ddb engine python package and the dgdb python package you

22:26bet uh duck DB engine is our SQL Alchemy dialect so SQL Alchemy is a python or object relational mapper it's really good for um you know using python to determine your your schemas instead of SQL if you prefer that it's also good for migrations and it integrates well with a lot of other tools so we're using it here because it Powers jupy SQL and

22:46that's how jupy SQL connects is with the the SQ Alchemy driver uh because SQ Alchemy supports so many databases it makes it easier for you know things built on top to connect all different things um so that's the advantage we're using it for here um the uh the creator of of dub engine Elana has said oh I

23:04wish I could rename it uh because it is a a tough name that's not exactly the standard SQL Alchemy but um we use it we love it we really appreciate the work that she put into building it nice um so

23:16yeah let's let's just run that um and after basically we do the import uh we have 3B uh why do we need actually pandas here in this case the jupy SQL can return things directly as pandas uh to you and that ends up being nicer in a jupyter notebook mostly for looking at the table output it's formatted nicer

23:37okay um but it's actually faster as well so because pandas is is written in C uh C++ and so is duct Deb it can actually transmit the data all at the C++ layer and kind of turn it into python at the last second whereas if you use the python DB API like fetch all it has to convert kind of every individual object

23:56into a python object and and I love python it's flexible but the objects have an extra layer of abstraction over like a c type or something like that so um it's faster if you use pandas okay so that yeah that actually I didn't know but that's uh that makes uh uh total sense and so um yeah here we

24:18have just uh a couple of setup of uh configuration and what we are going to use is uh SQL magic command so that basically we can use uh pure SQL uh command and ah wait I did something really

24:40bad uh apparently people doesn't have access

24:48to must have link to access it yeah see

24:54anyone with the link all right should be solved sorry for that and thank you Nick for mentioning that out um so you can actually check back the link I sent on if you're on YouTube or in in and uh follow along uh

25:15the the collab here um so yeah I was talking about uh Magic command in SQL it's just that instead of uh having because here we using a python runtime basically uh but instead of wrapping this into you know an ugly text drink uh we can just have a pure SQL directly uh uh over

25:38there so yeah so that's basically it and um then we're GNA do uh one thing yeah we

25:49we still going to use a mod deck but we're not going to use the mod deck inine we're basically using uh the shared database so mod deck has a feature uh for share database so it's basically a Doug DB database in the cloud and we have a couple of uh open data set over there and so uh you can uh actually

26:09create an account on ddb on mother deck um it's free lunch for the moment so um just and super fast so if you want to have access to those uh share data set just create your account and then you can get uh your service token on your profile and that's what we are going to fit uh here so

26:34basically just putting passing this so I

26:38passed the my series token from Mod deck account and now I'm just connecting to mod deck using this roll and um and yeah

26:48here it's basically the same way if you would use uh a local dougb right Alex um

26:56but we just have a the MD two uh MD

27:01column can you a bit explain what's happening behind the scene with the extension sure so in this case we've partnered really closely with the founders creators of dctb over at dctb labs and they've added a special hook into duct TB itself for us where um typically you'd be connecting to duct TB with a path of uh where your file is and

27:20your local file system you know here's where my my file is um if you actually start with MD colon it's just a special if statement says hey if you start like that what you actually mean is you want to connect to mother duck and what we're going to do is we're going to automatically install a mother duck

27:36extension for you in an extensions folder um same place that the other dctb extensions go when you install those and that extension is going to open up a connection to the mother duck server and uh when you when we make that connection we'll go ahead and send you the information about kind of your your tables your views and what's in your

27:54database and then you can go ahead and hit it with queries um the nice thing about that is some of your queries can run on our server in the cloud and other ones can run entirely locally um if you've already copied the data locally so um you get that nice flexibility yeah thank you um so yeah and and just to give uh context for

28:14those not super familiar with Doug DB extension so uh Doug DB has this notion of extension is clearly like package that you can install to dou DB and you have supported extension from ddb Labs that you can just do install you know for example they have uh extension to connect to http uh FS so htps file that

28:37hosted on history or Google Cloud Storage or they have also an extension to manage adbs uh credential for example so you can just do install uh ads and you have this extension loaded um but here what's the beauty of it is that U everything is happening automatically so you don't have to install or load any uh mod deck specific extension and now we

29:00are uh connected to mod deck so basically here what we're doing is just uh querying uh sample of data which is

29:11uh if you look at mod de take that uh

29:18quickly uh our documentation website you would see where what do we have actually by uh

29:30by data set public data

29:36set so if that's a so mod do documentation website you go to sample data set and queries and you have information about the uh the different website uh the different simple data that we have and uh the simple database actually contain multiple uh data set and actually uh available as soon as you create a mod de account you can play with those and

30:03uh you have information about uh those data sets and a couple of example of queries um so just to uh to give you a

30:12rapid context about where those data set

30:17is coming from coming back to our notebook so if we run this so that's the first

30:27first uh Magic of theb is the FR statement right um I think I'm not sure

30:36why this hasn't been done before or at least from like the the database I know there isn't uh that supports do you have any hint on that I think there may have been one that did it previously I think it might have been a dialect that was part of a bi tool or something so it it may have

30:55existed before so credit to them for being ahead of at the time but it's not widespread and there's no other popular one that I know of that has it um part of this goes back to that that fun fact that you had about SQL using to be spelled out seq L it was designed to be english-like and it was more like a

31:12sentence to say I want to get these columns from this table and so it was just meant to be more more like spoken word uh but it's always been backwards if you actually think about it the you know every intro tutorial you get for SQL is going to have here's the order statements and here's their execution order and here's how they don't match uh

31:31but now they can match how about that yeah so it's it's for me it's just crazy the number of time I do a select star the table and then a limit then just to you know get the grasp on everything because even if you have the meta data you want to have a quick view like just like how the actual data looks

31:52like you know first um and um and so

31:57yeah I think that's like one of the biggest uh thing thing I like about uh Doug be is really simple and then the next thing is that we can actually Nest directly select and that's actually still pretty new was released this year no at the same time as that from only one so it's it might be two years old by

32:16now but it uh okay okay the the basically benefit is you don't have to you can just reorder the Clauses if you want so you don't have to to skip what columns you're grabbing can still put it there but you can have it in a more natural order um the other really nice thing about this is it uh it helps us

32:33with those AI overlords that are coming right as as soon as you actually give the table information autocomplete is going to have a much easier job of figuring out what you're probably going to do so um this this uh even helps out our auto complete yeah and that's a really interesting thing is that there is like certain pattern I think like for

32:51documentation that's also to write documentation that is B friendly that can help you know the AI to parse and understand thing and I think in the same sense it's like right you know programming language that is also uh I mean in that context for example in that specific use case where uh the the AI basically can tell from uh from the

33:15table all the metadata directly information and send you out so this also uh it going to work basically um

33:24not so so much surprise here um and then we have uh the thing you were mentioning uh just earlier the group order by all um so typically here we have uh a group by and instead of

33:41specifying each goon right we just specify Group by all and is that the same behavior with the order by it's similar so the um order by all is going to order by every column in a row and uh ascending by default just kind of sort sort of like how an Excel pivot table sorts it's always going to

34:01sort ascending column by column um you can adjust that you can say descending after it if you want to flip it around you can even say NS first NS last if you want to get picky about where where your nulls end up in that ordering um but it's really handy you know a lot of people get around this by saying oh

34:17order by the numbers one two and three but when I'm playing with a query for a long period of time I always end up deleting columns and adding columns and I always have to go back and change that number I've seen a query like some of it's autogenerated but I've seen a query that probably had like the numbers one

34:33through like 20 listed in like a group buy and in an order buy and and but like number 16 would be removed it's like that is the least comprehensible thing ever right just like group by everything except column 16 like oh what a mess so I much prefer this way yeah and it's true that like here we're playing with like really

34:52really simple example right but typically when you uh like in practice in production you probably often have more than three colon on your final data set so the problem is basically bigger right um than here so yeah so this is something the group bi all I was actually using it but the order by all actually I wasn't I wasn't aware uh that

35:15was also existing so oh awesome so yeah um so and then we have uh I put a 2B uh

35:25here colon Alia and I think this is uh a bit the same thing than um that that I was saying earlier those thing that's happening behind the scene and uh and I I completely forgot actually that basically if you're using an alas here in the in a select statement they're not available um in other clothes in group by or having or

35:55wear so you have to to repeat those uh which can be tedious and just had you know more uh human error prone right because you define a different place um and so this is like actually something that felt natural and I don't know why at some point uh when writing seal with their DB you know it got me and then

36:19recently I went to uh other cloud data we have because I do um I like shopping you know wrong and

36:28uh and and then it was not there and I was like but yeah but that's that's obvious I knew that um and so you know like like the magic kind of disappear so so yeah that's serting but this one's a funny one in a lot of cases this is kind of going back to like in general we

36:48database folks love High precision and in theory there's no way to know for sure if the column you're using is is the Alias or the underlying one so for to be sure they made you not use the aliases at all but really if there's no conflict like if there isn't a column named year or month or keyword FS the

37:09database should know so duct DB is going to guess you know it's going to guess only when there's maximum certainty if there is a an ambiguity it's going to tell you an error out and say hey please be picky about it but when there's no ambiguity you know the database should know that you just rename something year

37:25and it should it should let you use that column name wherever you want yeah yeah yeah no so that's that's uh definitely so this is like really a one I was using that I kind of realized when I jumped out of SPC DB for for a while yeah and then once you jump out of it you have to do like a whole nested

37:43CTE up above it or like a subquery to rename it and do it or you have to just use these ugly functions in the rest of your query so it's it's there's not a real good workaround yeah it can be really verbi right if you if you don't have those yeah um and so this one

38:04um I didn't run this one U and by the way this is uh a data set from uh acar news I believe you have uh quite a lot of data up to uh 2023 May and is counting the number of

38:21appearance of tgb in title of I new P so

38:25you can see uh this was in October 2022 was mentioned 10 times already I think that was you know start of the hockey stick that we were mentioning earlier probably this time um so yeah

38:41the low part of the hoey stick is probably mid 2020 time frame everyone decided you know I'm stuck at home locked inside this whole Global pandemic thing let's check out database uh I like the way people think this true that's definitely true um so uh reasonable uh colon uh alas so

39:05it's kind of the in the same Vibe or maybe it's the

39:12no what am I missing here yeah it's very similar the main difference is the one above is like I can use something in my select in my wear Clause here it's I can use it in the next part of my select class which is really nice yeah so here nested you're nesting operation on one alast and then another then another yeah

39:34so yeah that's that's even a bit more advanced I think pushing a thing maybe a bit too far I think you can create like a lot of complexity uh to that you can you can uh this one I think red shift I think supports this so it's been around a little bit it's really convenient it's one of those where I saw someone post

39:55you know like I jumped from Red shift to somewhere else and now this isn't there and like I hate writing squel now you know like it was it was a very extreme they didn't say exactly that but it was a very extreme reaction because like it's another case where the database should know what you're asking for with

40:09no ambiguity and should just figure it out and so we're we're all about that yeah indeed um this one I I like it because actually I think I didn't say that but like uh I think deck DB kind of make peace with me with SQL uh I was more a data frame person uh I think there is two tool actually that has made

40:32pie for me SQL Deb defly because the the

40:36before the the the time I was using SQL was with cloud data warehouse and so on and just the interaction is not like yeah for a software engineer it's a bit complicated to test uh you know to create unit test to test it locally um and so ddb provide

40:56actually that right by by being able simple to run so now I can not only for you know data pipeline but as just a typical task or I want to inspect a CSV or a par or something locally yes I start to use you know SQL more as a as a toolkit than you know a language to to

41:16program and the other tool for me was definitely what do you think is this the second tool that made P me on SQL I don't know not sure uh maybe Ibis doing some data frame SQL hybrid I don't know uh no no that could be but uh no DBT or I mean any framework

41:37that basically help you to kind of again solve the gap of you know SQL for unit testing create variable reusing codes um and I think like the the two to together is is is really nice I at least it it feels like back in the days when I was developing spark Scala Pipeline and functional programming right and so

42:00coming back to SQL with those those environment um yeah feel nice anyway that was like little parenthesis all of that to say regarding function chaining this remind me typical Scala uh you know nesting data frame which is this is why I like this one yeah um yeah so what we

42:21do here is basically we have the string uh we do uppercase then we split it into

42:28a list right that's correct yep and then

42:33we basically recombine it uh with uh a

42:37DOT for each word and then the concat

42:42basically make it back as a string is that correct yeah I think um the concat is at the very end I think we just had a period at the end um with that one ah okay sorry yeah that makes sense it's a very contrived example so I I I I uh it took some work to to come up with one on

43:03the Fly here with with no for this one but um yeah it it's pretty neat basically using the dot it's just like in other languages that allow it you know where it the first argument of that function is what you have right before the dot and it keeps you from having to Nest everything super deeply and it's

43:21super in my opinion super convenient um this was actually probably the most controversial of all of these when I talked about Twitter people were like oh my gosh you know like how am I going to parse this in my SQL tool that parses sequel It's like yes it is a new trick for SQL this is true um I haven't heard

43:39any push back from people using it though it's very handy for using on the system builder side I could see this is more complex you know this is it's tougher to support duct DB syntax if you have to parse it now um but man just it's much cleaner to look at yeah and you you basically make uh make friends

43:59with data frame people like me um so I think I think that's a pro I I Ed P this too but yeah I'm more of a sequel uh sequel to start yeah um exclude replace uh so yeah if

44:13you want to uh select everything except the column uh some columns so that this is also pretty um pretty Andy um Dynam colum selection so here we just do a

44:30Lambda function on based on columns right um and just to look any uh colum

44:38that starts with pop I think that's really useful I've been um using U once or two I when I have a really large uh table that I'm not sure and when I do the describe it's like you have a lot of things like really when you're like doing some data data mining you have no idea the data you're dealing with uh I

45:01think like like just searching uh based on some keyword and returning uh some of the column um so here we see that we have to PO sour simp population yeah it's really great if you're generating columns automatically in some way like if you're doing one hot en coding or you're you're doing some of the machine learning stuff where I've

45:21got you know I'm trying to do a prediction so I've got three different columns that are very similar but but name the same but I have a thousand columns you know this means that a lot of those workflows that where I used to be in a data frame and I used to be getting a list of my columns and doing a

45:36text search on that list and then doing some operation on that list of columns you can do it all right sequ yeah yeah all right pivot and pivot so this is more like a SQL thing uh an Excel Excel thing I think for the people I never like use that much uh By and and

45:56buot what about you uh for me I ended up

46:01using it a decent amount because uh in the group I was in at in I was in the supply chain and it's a we production scheduling and yeah if you have a schedule it makes way more sense to look at it pivoted out where every column is a new week every column is a new month and it's just so much more visually

46:20appealing to look at that way that many many things were structured that way uh but it's always been best to do that as the very final step you know working with pivoted out columns is always Messier in SQL and so this is a really great kind of final step of just just pivoted outwards um but that that's the

46:37main use case for me you know here this an example with years kind of the same idea like it allows you to do a much more a very quick visual interpretation of what you're looking at yeah not I think that makes the that

46:50makes uh total total sense um it's just like yeah it's true that I again like as I said like I'm I'm mostly a data frame person so that's like a different different side but I know like people coming from Excel or a Google sheet they use they use that a lot uh because it's kind of uh those people actually don't

47:11use SQL right when they and I think like byot or uh function in in Excel is pretty popular super popular I I call myself a reformed C Excel Guru I uh that was my Gateway into programming was Excel VBA so that was like the first thing where I was like wait I can just wow write this and like I can just

47:31delete files on the computer like what like this was this was news to me at the time so uh I thought it was really cool and I got really into it and that's that's what convinced me to to actually go learn programming you I didn't do it as much in school I just really um that got me excited where I could just Oh you

47:48mean I could do anything like okay this is pretty sweet um so so that being said I I respect the power of excel there's some stuff that is just it's just so good and pivoting is one of those where um it just lets you use your visual pre precognitive ability to just look at something and see the problem and yeah

48:07that's uh you can underestimate that pretty easily if uh if you're not uh playing around with Excel all the time yeah I think uh I think Excel still has its place I'm curious about like what thing the people live with us uh are you

48:20fan of Excel or not I think there is like really strong opinion depending on your profile if you're sourcing from an Excel Source or if you're actually an user from Excel I think that's where I've got on both sides of that yes that is hard I will agree with that very much yeah all right uh string slicing um yes

48:42that's that's fun that's again a bit like something I mean bonis I would say

48:49no what where where is it come from exactly yeah but you can slice string yeah you can SCE string um based on position here so I remove the two and I don't I I don't love ddb anymore I love duck probably the same thing this one this one is another this one is another yeah this one is another um

49:15thing that based on the I think which was the one uh the group uh by all um yeah the trading comea was also some which just happened and then I hit my you know a wall on another system to say

49:34but yeah of course I shouldn't have trailing comma How how does that work you know yeah absolutely and this is just the the folks building dect Tob uh really familiar with their parser and they were able to really kind of make it dance and I think this is one of those where you know there's a lot of places

49:53they had to change this you know your select cause your wear cause you're everywhere but basically anywhere you can put commas it it handles a trailing comma at the end and it won't be mad because we could the computer should be able to figure this out right you know we know that you don't actually mean to put comma than nothingness so uh just

50:11another Common Sense let the computer do the work situation so just to clarify for the audience is that normally this would uh this should be better um or standar SQL

50:24and this would an error and actually typically uh the problem is that when you um actually explore your query and

50:34you typically you know command some colon then you might end up um for example not in that case you might end up with a trailing comma um so that's where it's pretty handy to just not not think about it and that's again something that just that I knew initially on SQL that was needed and it just disappear in my mind until I

50:59I try another system and I hit a all um the other yeah oh one last funny thing on that that also to be Taps into another you know age-old fun argument in squl land which is yeah trailing commas or leading commas you know like people have very strong opinions I think the main reason that leading commas are preferred

51:21is because you're usually editing the F last couple of columns and you want to avoid the trailing comma error so to me this kind of I used to be a leading comma guy like all my old queries leading commas I have changed I'm a changed man and I've I've been won over to the trailing comma lifestyle so uh

51:38try it you might like I'm curious do you are you are you in the audience using the comma in front because of that or at the end you know drop in the comment what kind of sequel style are you uh so that we solve the mystery uh we are almost uh 10 out so I'm going to accelate but anyway we are running at

51:58the end um there is actually that's a

52:02duplicate uh we're running in circle um and we have two more uh colum with regular expression uh we actually do that did that too um with I think we did

52:15more Lambda function here but not with a regular expression so that's also working so the columns is pretty powerful and this one has to be also we uh did that too I promise I didn't do uh The Notebook when I was drunk maybe I was a bit tired that's that's different okay no problem I just fix it so we have

52:3610 uh I think that which is uh actually a nice number uh better than 12 so 10 things in SQL um yeah so that's

52:47um that's about it um for a SQL Journey

52:53uh about uh dgb a little quiz also to

52:58run when SQL was invented can I ask you again when the when the squl was invented oh it's a real quiz now no I think it was 74 now I know water man I got it yeah exactly cool uh at least you

53:16you learned something also from me uh but yeah the bottom line I think is like go over those uh those quare the those this net book and those function because they're really there uh to help you out to actually minimize the code also I think that's really important to mention is that some of them Mak the code just

53:35more readable because it simplify some stuff right and other stuff is happening for you know you know behind the SC kind of magic like we saw the trailing comma uh the group by all if you get used to it and you learn for the first time uh SQL with dougb so it's it's important to know that it's uh thinks that the Doug

53:56DB team has been put in place to just make SQL friendlier right anything you want to add I would just say also you know this is something where if there's more you can think of let us know uh open a discussion on the duct DB open source repo this this is an area where we're we're happy to kind of lead the

54:16charge and do something that people haven't done before of the SQL should be able to do blank but it doesn't today um tell us your stories of of pain and woe of you know I always get this dang error and and maybe we can fix it maybe we can solve it for [Music]

54:36everybody

FAQS

What SQL features make DuckDB friendlier than standard SQL?

DuckDB adds several quality-of-life SQL improvements. GROUP BY ALL and ORDER BY ALL eliminate manually listing columns. FROM can appear before SELECT for a more natural reading order. Column aliases can be reused in WHERE, GROUP BY, and HAVING clauses. Trailing commas are allowed everywhere. And function chaining with dot notation lets you write column.upper().split(' ') instead of deeply nested function calls. These features reduce common SQL frustrations while keeping queries clear. Learn more in our DuckDB getting started guide.

How does DuckDB's FROM-first syntax work?

DuckDB allows you to write FROM table_name SELECT columns instead of the traditional SELECT columns FROM table_name. This matches the logical execution order of SQL (the table is resolved before columns) and makes autocomplete and AI assistants more effective since the table context is known before column names are needed. You can even write FROM table_name alone as a shorthand for SELECT * FROM table_name, which is especially useful for quick data exploration.

What are DuckDB's dynamic column selection features like EXCLUDE and COLUMNS?

DuckDB provides flexible dynamic column selection. SELECT * EXCLUDE (column_name) lets you select all columns except specified ones. SELECT * REPLACE (expression AS column_name) substitutes a column with a computed expression. The COLUMNS() function accepts regular expressions or lambda functions to dynamically select columns matching a pattern. For example, COLUMNS('pop.*') selects all columns starting with "pop". These features are especially useful for wide tables or automatically generated schemas.

What tools can you use to write and run DuckDB SQL queries?

DuckDB supports many development workflows. You can use the CLI directly in a terminal, or pair it with VS Code where you write SQL in a file with syntax highlighting and send commands to the DuckDB CLI below. DBeaver provides a GUI with table exploration panels, which is helpful when working with physical tables and multiple databases like Postgres alongside DuckDB. For sharing and teaching, Google Colab with the %%sql magic command offers a zero-setup notebook environment.

Related Videos

" Preparing Your Data Warehouse for AI: Let Your Agents Cook" video thumbnail

2026-01-27

Preparing Your Data Warehouse for AI: Let Your Agents Cook

Jacob and Jerel from MotherDuck showcase practical ways to optimize your data warehouse for AI-powered SQL generation. Through rigorous testing with the Bird benchmark, they demonstrate that text-to-SQL accuracy can jump from 30% to 74% by enriching your database with the right metadata.

AI, ML and LLMs

SQL

MotherDuck Features

Stream

Tutorial

"The MCP Sessions - Vol 2: Supply Chain Analytics" video thumbnail

2026-01-21

The MCP Sessions - Vol 2: Supply Chain Analytics

Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!

Stream

AI, ML and LLMs

MotherDuck Features

SQL

BI & Visualization

Tutorial

" The MCP Sessions Vol. 1: Sports Analytics" video thumbnail

2026-01-13

The MCP Sessions Vol. 1: Sports Analytics

Watch us dive into NFL playoff odds and PGA Tour stats using using MotherDuck's MCP server with Claude. See how to analyze data, build visualizations, and iterate on insights in real-time using natural language queries and DuckDB.

AI, ML and LLMs

SQL

MotherDuck Features

Tutorial

BI & Visualization

Ecosystem