DuckDB WASM: Run SQL Analytics Directly in the Browser

2024/01/19Featuring:

TL;DR: Explore WebAssembly (Wasm) use cases for analytics, see DuckDB running in the browser, and build a Firefox extension that displays Parquet file schemas using DuckDB-Wasm.

What is WebAssembly?

WebAssembly (Wasm) is a binary instruction format that enables running code written in languages like C++, Rust, and Go in web browsers. First released in 2017, it's now powering applications like Figma, Photoshop Web, and Disney+.

Key concepts:

  • Sandboxed execution: Code runs in an isolated environment
  • Near-native performance: Much faster than JavaScript for compute-heavy tasks
  • Portable: Runs in browsers but also in Docker containers and edge environments

Why Wasm Matters for Analytics

Modern laptops have incredible computing power that's often underutilized:

  • A MacBook Air today may have more CPU and memory than many cloud servers
  • Running analytics locally eliminates network latency and reduces cloud costs
  • Parquet files enable efficient column selection and predicate pushdown over the network

Combining Wasm with DuckDB means:

  • Zero installation: Users can run SQL queries directly in their browser
  • Local compute: Process data on the client without server round-trips
  • Privacy: Sensitive data never leaves the user's machine

Real-World Examples

  • Evidence.dev: BI dashboards using DuckDB-Wasm for instant filtering and aggregation
  • TensorFlow.js: Train ML models in the browser using WebGPU for GPU acceleration
  • Docker + Wasm: Run Wasm containers alongside traditional containers

Demo: Parquet Schema Browser Extension

Christophe Blefari built a Firefox extension that displays Parquet file schemas when hovering over files in Google Cloud Storage:

The Problem: Checking a Parquet schema traditionally requires:

  1. Download the file (possibly gigabytes)
  2. Start a Python environment
  3. Run pandas/pyarrow to read the schema

The Solution: A browser extension using DuckDB-Wasm that:

  1. Listens for mouseover events on GCS file links
  2. Sends a message to the extension's background script
  3. DuckDB-Wasm reads only the Parquet metadata (no full download)
  4. Displays the schema in a popup panel

Copy code

// Initialize DuckDB-Wasm const db = await duckdb.AsyncDuckDB.instantiate(); const conn = await db.connect(); // Query Parquet schema without downloading full file const result = await conn.query(` SELECT * FROM parquet_schema('gs://bucket/file.parquet') `);

Key Advantages

  • Instant schema inspection: No file downloads, just metadata
  • Zero bandwidth for schema checks: Parquet stores schema in footer
  • Works with any cloud storage: S3, GCS, Azure Blob (with credentials)

The Future

  • WebGPU: Direct GPU access from browsers for ML training
  • Decentralized analytics: Query encrypted data locally with keys stored client-side
  • Browser-based data apps: Full analytical applications without backend infrastructure

0:00anyway for this session I would like you to dream uh it's session about wasm and

0:08also dctb if you've heard about wasm you cannot know what it is but you don't really know it's use sces of analytics uh and especially what ddb has to bring within the wasm ecosystem then this session is for you um so easily the Quack and code session is happening every other week um so this week we have

0:29also a guest Kristoff that's going to join us uh quickly and basically the goal is to have an informal chat about the topic the technical topic and then basically dive into codes uh so we will have a small demo and you can also uh feel free to uh basically code along and

0:49uh if you're live uh please say hi in the comment and tell me where you're coming from and if it's also uh super cold on your s so is there people over there yes I see some comment

1:12coming and so we are also uh live streaming from Twitter I never actually did that so this is also a test uh but we are on LinkedIn uh YouTube okay Jordan from Florida hi yeah I know I know it's it's um it's just that it's pretty it's pretty cold I get I get easily easily called uh so so

1:40yeah that's that's the thing um someone is asking what's the's hat uh I have too

1:48many hats as uh as people can tell there is a there is a a legend that says you know how many hats I have by uh the time you've been working with me because get to see them and condemn over time anyway um I welcome my guest uh I don't know I did a lot of things wrong for this live

2:08stream and I see the overlay background is not right this is better and there we

2:16have uh Kristoff welcome thank you for the invitation I'm so happy to be here and no worries like mistakes happen every time like yeah so

2:28so the thing what happen happen exactly if you want to know is that if you click on the on the on the live stream event and you stream and you stop the event because I was like oh let's not go live right now then you basically just consider these event is closed and there is no reason why you should go back

2:48online and so basically the event created mightbe people might confused but I remember I did this once with streamyard like yeah iect it up as as well once so happens yeah so we have also someone from tul so you have some French friends right because kristo is French I'm not French by the way uh I do have a French accent but please uh not

3:15everybody that speak French is French right be careful with your [Laughter] shortcuts but Kristof please uh introduce a bit yourself uh for for the audience for people that actually doesn't know you so hey people um so I'm Kristoff

3:37blar I'm French um I've been graduated

3:43years ago uh in software engineering but since I've started working I'm mainly doing data engineering uh I would say that I'm a data engineer softare engineer I don't care actually but I do that engineering mainly um I'm pass you next made about a lot of things I would say but in my free time I mainly do like video

4:05games um I bike I as well started back running like in August last year so I run a lot like

4:17four times a week something like this and yeah yeah so right now so I'm in Paris so it's not cold it's just rainy but um

4:31right now I have an excuse because I got a tund and so I cannot run any more so in the last 10 days I didn't I didn't don't you don't need to tell me an excuse it's it's fine I I would I wouldn't run with it it's it's too it's too cold I mean it's it's even dangerous like actually

4:50this is like the the Berlin proverb that there is no cold there just like bad clothes yeah indeed so maybe that's that's why we're I'm wearing this hat but anyway and also what you didn't mention like Kristoff is writing an amazing newsletter uh which is called the data news and that's actually how we get to know each other I think initially right

5:14because we were both writing uh datering content uh really great news later um you spent a lot of time on that and it's a bit on a break can you tell us a bit more about that yes um so you you can

5:29find my us on my uh personal website SL blog it's on B.F I guess you have like maybe up as well the assets so we have no name but yeah so it's my my my myo um and uh the newsletter is weekly

5:47uh I started it again like two weeks ago so it's not on break right now it was on break on December um and the idea of the newsleter is to do a curation about everything that is going on in the data ecosystem so I I create articles in data

6:07engineering analytics engineering um data science and data

6:16analysis and the idea is like just to give to people like the best articles that have been written in the last week with uh spicy opinion and with my French sasm which is something that people like actually this is like one of the the the the feedback I get the most which is like yeah we like your tone and and your

6:41French sarcasm so yeah that's something natural but uh I continue to do it uh

6:49like this cool no but uh it's defitely

6:53really good written um so I strongly recommend it I'm actually not following that much uh I would say uh data engering newslet but I think one you is definitely one I would uh I would recommend specifically just one question do do you follow it like by email like you receive the email or you have like RSS

7:18fe uh I do I have an email for uh forward I use read wise you know basically it's managing U helps you

7:29manage your newsletter that you receive by mail and I have also RSS feed there but I think yours is just connected to to my email that's your question yeah we have people from uh sorry uh from Brazil just because it's uh you know some people are are bragging about the their temperature 25 degrees Celsius that's quite cool for

7:54a standar yeah all right uh anyway was

7:59let's let's get into it because we already a bit uh late with the agenda um so who actually knows like in the audience like what was my stand for I'm

8:11curious like where does actually people heard of it for the first time can you can you tell us like Kristoff where where did you Hur was M for the first time probably related to video game actually like um you know I stopped it recently but at some point I was like uh on AC news like in my life

8:35I was on AC news like every two hours like it was like to them I was wasting so many so many hours on it and you know like uh all the the translation of old video games into the browser in wasm like Doom um

8:55Mario and stuff every time they do like the front page of I can use and maybe that's how I knew about it the first time maybe five years ago something like this like on the first translation with doom stuff like this yeah yeah indeed so

9:13wasm stand actually for web assembly for

9:17people that doesn't know and uh and basically it's mostly used for speeding up web application uh by allowing them to run code reading in multiple language usually it's run into lot of a language like C++ C or rest um there is other also interface and as I think goang too I've seen that uh that's that's that's the most common um but

9:46yeah it's actually started do you know when when it's gonna when it started I would say it started like with maybe the HTML 5 like specification so I would say 2005 something like this 2007 I don't know I didn't look oh yeah uh way way way later actually the first

10:08version was uh actually in 2017 so it's only okay six

10:16years ago yeah I think it's probably the ga uh you know those kind of things uh

10:23take times to like they get a lot of intention before getting ga uh so that's why probably you heard it uh before like the same with HTML 5 um so that that that's kind of thing um but the thing so I have a couple of U so we understand so far that uh wasm is basically a way to

10:47uh run something in the browser with other type of language it's pretty lightwe um I have a couple of use case here I want to discuss uh let's dive into I have a couple of article just one question maybe I I'm I'm mistaken but uh

11:04wasm to me is not a way to run something in the browser in another languages is more like a port of something yeah it depends but it's like a a part of another languages in the brother so like uh when you have like a specific Library it has been like translated into like compiled not translated but compile into JavaScript within the Brer

11:32so it emulates kind of your um language

11:37Behavior but actually it runs in JavaScript not in the in the language you want to run it I guess maybe I'm mistaken but um on the target yeah on the target so I'm not JavaScript neither or wi expert uh the way I understand it is that it's basically a mini VM that's run in your browser uh there's there is Javascript

11:59interface to the wasma compiler but wasm can be run actually somewhere else on edge or uh in a container and so this is

12:10actually one of my article I want to share um let me grab that uh so for

12:16people that's been using um

12:21Docker let me share my

12:27screen um so yeah so Docker has introduced actually the usage of was mean technical preview um that was when that was that was a couple months ago okay see it dates uh uh let me zoom out and um and basically they they explain actually um why what is wasm um

12:57so they mostly said that explain it that's it's running to enable you to run in different language in a sandbox environment and they give you also example so figma that I know is a big user uh of was Photoshop it's like

13:14the because Photoshop has a web version uh I believe um and and Disney

13:21plus apparently so there is a lot of company already using it so the point is that indeed it's only it's not only running in our browser but this is where we mostly see a lot of improvement right because we have a lot of uh AV application like uh figma is is one of them right um so that's a that's a way

13:41basically to speed up things and so it's really interesting you can go over there not going to go over the detail here um but basically they explain how they integrated with the the docker engine and what they they they they did it and you can also so start to play it and basically run uh a wasm container in a

14:04in a specific language uh so really looking forward where this um this GNA

14:11head I don't know if you heard of that or not before no no no but looks uh yeah I have so many ideas with this I'm curious if any of the audience have heard of that because uh so article is uh I cannot figure out it's still it's it's a couple month okay so the DAT is there um so yeah it's still it's

14:33already pretty old it was in technical preview so I'm not sure what's the the state right now but it's a year behind maybe they dropped the idea for for a different reason uh but the point is that there there is a lot of uh exploration around this so another thing uh I want to talk and it's probably I

14:53mean you talked about that that you heard was um around uh video game

15:00topic um and so this is have you heard

15:04of web GPU yeah I've seen this I didn't read it but I have it like in my book Yeah so basically webgl is the old standard uh to access GPU capabilities within your browser um and basically uh

15:26what web GPU does does is uh tapping you

15:32basically enable you to tap you directly um into more low level of your gpus and why is that uh related to wasm

15:43uh is because if your Brer have better interface to your graphic cards for let's say training models and if was enable you to have sendbox environment to have uh enable you to train a model within just a browser URL so everything is running locally not on the server side and so there is also tens of uh. GS that exist

16:09right so I guess some people hopefully we have data people um on live here but

16:17basically the denor for. GS is uh is a web

16:24assembly version to uh be able to run test and flow on the Ed so you would train or inference a model in your browser on the edge so you can imagine that within a newurl someone gets basically all the package needed to train a model so without installing anything and on top of that we've that's where I'm doing the link with web GPU it

16:47could tap directly into your Hardware uh

16:51GPU um and so that's basically lowered the technical barrier to entry for for a lot of application right um because today if you want to train your model classically and using the full power of GPU you you're gonna run python locally and so on and TR something does that talk to you or not Christof yeah it talks to me but uh I

17:14guess like the the main limitation on this like accessing because there there have been always like a huge debate in the HTML Community from what I've seen

17:28around the access of the Computer Resources you know and

17:35having like a tab that has access to your GPU or kind of seem seems freaking to me like I'm

17:46not uh I mean I mean the browser T is already eating all your your CPU so why not give it them away the GPU but anyway also there is all everything related to games you were mentioning with doom and so on but I think we we GNA see a lot of like really powerful game just like 3D

18:05game running in the browser so uh I'm really excited about that I think this is really still early web GPU was released I think uh middle next year by Chrome and so you have time that other browser also support this but this is going clearly there is a lot of signal that it's going into uh into that

18:26direction so um that's for kind of the introduction around wasman the use case no let's narrow it down to there is something just to sorry to interrupt you but just toad maybe in in the concept like there is something that has been made uh the

18:47the reason like web asly is going to be a thing in the future is that at the moment our like laptops and computer and just our work laptops have like a lot of resources lot of capabilities so we we we get more power than before like in our um home computers or work computers

19:08and so that that's crazy because we don't need a server like we needed before because for a lot of people for instance when you have like a M1 MacBook uh maybe your MacBook has more like CPU and memory than the server you develop on like the remote server you develop on so it's crazy because it means that we

19:29can do so many things locally no that's true thanks for for the reminder is that everything that basically enable you to do some compute locally and within your browser within without installing anything and tapping interor GPU is because our current machine are way stronger and uh we are mostly using it for browsing like if you look like just the basic MacBook Air uh

19:55now which is like not that expensive if you compare like 10 years ago um and the power that this machine has is just is just crazy um so I think we are just scratching the surface like at the moment it's not this compute is not is not really leverage and um there's our path to to go there I have a question

20:18here on YouTube are gpus well suited for the kind of workload of dougb has well not really we're not leveraging um uh gpus um but but uh as far as I know it I

20:31may say some St something stupid it's more analytics workfl workload I think uh GPS are way more interesting if you need to train model uh that's like the classic use case uh at least on the data site let's narrow it down to now we talked about General use case on was um web GPU and so on uh we did discuss one

20:56uh data use case with t Flo um but regarding analytics so what's your what's your your thoughts today on wasb and analytics I mean apart from Doug DB and then we dive directly in Doug DB um I think it relates like to just to what I just said actually like the fact

21:23that in in in our browser we can access

21:27to our local MacBook M1 M2 M3 uh infinite when

21:35I say infinite it's compared to like a a small machine we run on E2 infinite infinite power um opens the door like to to decentralize I would say like maybe the word is this one to decentralization in term of analytics I mean at the moment what what we have if we look at data platforms we have like

22:01two ways I would say we have data warehouses and we have data legs for instance Lous to me is like a data Lake because it's just like a vocabulary but in the end it's like files on on on on

22:14blob or on cloud stes so you have like dat and and and warehouses and the issue with those two is that you have uh in the cloud to pay for some uh for some comput if uh we find a way to be like clever with the usage of the analytics in the brother with wasn't I guess companies

22:40can save a lot of money in term of like um data data compute actually because

22:48yeah companies have like good computers good laptops maybe good like I I don't know but they have like good stuff and if you are clever in the way you access data in the way you ask the query you can sometimes avoid like querying like data breaks not like be quy or yeah maybe mzc but you you can find a clever

23:12way like to avoid calling like the remote data platform and um and doing the compute locally and that's crazy like to to to think about it and to to see that it opens like many do I guess yeah but there is a challenge right the architecture is not so obvious because then you're like yeah okay but my data

23:36is going to be centralized somewhere and you know we put so much effort to put it in the cloud the goal is not to put all your data locally um but and how to to

23:47avoid uh you know uh Network traffic and so on um and regarding that so actually

23:55let's let's dive into uh Doug DB so there is a couple of use case today um I can name um for example there there is multiple uh notebook style um data tool

24:09or other type of Tool uh so there is like for example uh let me

24:18share my screen why can I not so here for example um evidence uh evidence uh is basically

24:29a way to create dashboards with just SQL and markdown and they they uh it's a it's a JavaScript framework behind the scene and now they're using uh Doug DB wasm which means that if you need to do some filtering and so on activity uh you can see uh it here basically what it does behind the scene is that there is

24:53duck DB which is running in your browser locally and executing those query and displaying the result um to uh to you so that's why it is super Snappy right and the classic architecture that we have today uh regarding data vistion tool is that they often uh rely on server sometimes they're going to cach some result and some query but they mostly

25:17cach it they don't compute uh things uh things locally you have other use case like that in mind that you've seen um I guess that's one of the the

25:30most obuse one I guess um and just this one to to to go deeper it requires maybe

25:37for for instance to make it um work easily and not to get lost into a lot of data sets I guess you you need to have like one mark with one table or like one one big table like the OBT to to to to have not like a big mess because if in your dashboard you are creating like a

25:56lot of different data sets I guess it's G to be a lot of like Network calls and stuff but yeah I like this use case and I I had one use case like in mind regarding like this decentralization and the fact that you can use like was the DB uh in in the browser which is Imagine

26:16like a I don't know if you know a bit about differential priv privacy but the idea of differential privacy is like to do statistic and to do like comput cap Computing on top of data that has been ciphered in order like to um to keep anonymization and in privacy and stuff um there there might be a use case in

26:42data decentralization where like in the in the central repository in the warehouse house dat L you have everything that is like um ciphered and

26:52the only place where the stuff is ciphered is on the browser like the the the client like like reading the data is coming with like his ding key like with

27:03with the key and you don't have the key if you don't have like the the token you could tyght you could tyght a specific Hardware to yes can can

27:15you have yeah that's fine yeah yeah no so the the in term of security too yeah there is like like you mentioned uh but again it's like okay how do you minimize the traffic and so on and so just like also a small plug mother de released their research paper

27:36uh from cider um and just released a

27:42plog about it uh if you go to mdu.com blog um basically talks about the ab compute mode where we try to leverage as much as possible to make it simple and simple words yeah your local compute or the cloud in concert but the thing I'm pointing here is that there is a lot of research that needs to to be done

28:04basically still uh it's still like they we see the possibilities but um product are are really just uh starting and that's what I said at the beginning of the session right I want you to dream to walk away from this session and to dream so um I think there is a lot of things to to dream about do you have other

28:25other ideas in mind uh um I don't know but there is just something that unlocks like the all the networks and avoid a lot of mess in the network which is actually par the par parket in English I don't know how to say it um but like parket files

28:47actually are the key actually to to save like bandwith to save uh a lot of stuff because like you can do like the the colon selection the push down predicate like for the wear and everything that is related thanks to baret um I guess that's something that going to change in the in the in the following years um and

29:11maybe one day we're going to get rid of csvs everywhere and stuff like this but we can't we can't it's like a it's like uh Excel you know it's I think it's it's to stay there uh I guess it's a bit different but I guess we we we have to get rid of CSV like in the inter

29:32intermediat part because like on the source part yeah okay I don't care but like in the middle we we have to get rid of it yeah cool um we are going now to uh

29:45dive in something a bit more uh Endz on and so uh Kristoff you show so Christoff show a demo a really neat demo at at a Meetup uh and that's like that's why also I wanted to to cover that topic it's using ddbm um it's not uh for dashboard

30:07responsiveness as I as I show for for for thees another use case that that I found really interesting um and it's it's simple to do so it's uh it's a Firefox extension um I'm going to show you the demo first but first uh so Kristoff sent me a link with codes so and he say this

30:30is the current status of the code and I'm like yeah sure so you know without read me so it's kind of a challenge um do you do I hope you don't handle your code at your clients like that uh no it depends actually [Laughter] but no actually I know it's a Dr but um to be honest um I don't like to R Ries

30:58uh but yeah that's a challenge for me like to to explain my code and explain the stuff I read Because yeah I know

31:08this is one of my issue I'll be uh sharing right now

31:16uh but this one for for my defense this one is like an experiment uh and uh I

31:25send it to you like just row uh I have like to because like I was not ready like to put it like in open for people uh because if I if I had to put it like open I would have like commented stuff and added like more documentation but for the live I send it to you Ro because

31:47because of lack of time it's fine it's fine I like the challenge I like the challenge all right so if you want you can um clone the repository uh and uh and so as I said

32:03it's uh it's an extension uh a Firefox extension and what it does is that if you use Google Cloud Storage you can we can also make it easily work for uh S3 storage but basically what it does it's that it's popping up a small window when you're overing uh around the file and it's showing you uh the schema of that file

32:29and I'm guessing maybe I'm the only one but please don't leave me alone in the audience in the chat tell me how many times did you download a parket just to check up you know the schema or whatsoever tell me I'm really curious and and and and Kristoff did you have this use case where you just need to

32:49check par of schema what do you do yeah I I I

32:56I had so many times that that was the reason I developed this stuff and I had decided because if you just think about it um when you are in the data Lake and you decide like to go for par or or like any binary format um every time you just want to see your file you have to download the

33:17file go into python virtual start a jupyter notebook import pandas pandas do read packet and stuff like it was before like the and all the the

33:32that was like the journey so and the issue with this is that okay you click on download but the file is like one gigabyte and you have like a bad connection so you go for a coffee and and and then your your day is f and you for you forgot this and yeah this is like that that that yeah

33:54it's it's the reason I no it's a it's it's a it's a really common common thing and you might say also oh but you know we have you know a SQL inine would it be a Tina we have a blue catalog so we can query it directly from there but sometimes you have a pipeline that fail you have a wrong parket file s parket

34:15file that you identify and you cannot actually query it because it's messing up the schema the SQL engine they applying the schema on read right and so the problem with that is that usually you create your table based on the right schema and so if you have a small parket file within all your Park file that has

34:33a wrong schema you're basically have in a problem where you have to identify this one and so just to show uh quickly you say of course that was a preu DB era but if you run um just ddb CLI so and M OS

34:49of course so you can uh install whoop you can install the just write this right um personally iend people to use pip to install rather than but this is

35:03my but this is for the C you point out after the CLI is not uh is not is not really available as far as I know with the python package yes you can with the python package okay you can point it out directly but you need to add the path and why why why would I care about that

35:20it's a one like command oh you're right oh you're right okay um so once you once you install um

35:29so here it's a bit specific uh it's a bit hacky for Google CL storage why because you need to set up uh the endpoint to say that it's a Google CL storage and you see oh I'm setting S3 endpoint this is initially because ddb was supporting his3 and Google storage came along um believe they do some work

35:50around secrets so we're going to have dedicate Secrets based on the source will it be on S3 or Google class storage uh and I guess for the configuration as well I'm not sure about that but basically for the moment you just need to set the that and then your credential um and um here it's a public bucket so

36:12you can actually directly uh read it so this is the address of the public bucket uh you can go and try it um so if I do

36:23this I basically directly can query the the data from from a stre so this is really magic right there is nothing I need to do and I can also um

36:36describe uh the table if I want just uh uh the schema this is the schema within the duct DB type and we're going to see right now what's the what's the parket type but you see like that's that's super fast um the only thing I I would have need to do uh is set up the quential and and note that if you using

36:57AWS there is also the AWS extension so DWS extension and you can uh load directly uh your ads credential uh so you don't need to to set it up anything and directly C start cing your bucket that's magic uh of course there is always uh netor band with right so if you target uh uh every um Park file it's

37:22exactly what you mentioned Kristoff be mindful maybe on the colon selection on uh on your partition to prune a certain type of parket U yada yada going

37:35back to our extension so what is happening here what is happening here Kristoff please good question like a lot of things actually um I don't know if it was your first um Brer extension but uh actually uh yeah

37:58without read but I send you like the the tutorial from Firefox to understand yes

38:05indeed uh still um yeah in the web extension like the stuff is a bit different than in a classic web page but what happens here mainly is that um in

38:17your main main web page there is like a listener uh on the example. bucket uh

38:25link um and when you over yeah so it's

38:30not in the panel yes I guess it's yeah it's here uh is it's here I did a bit of modification and by the way I'm happy to it's a fork I'm happy to contribute back and give back to Caesar what belongs to Caesar it was your idea uh but uh but yeah so there is a

38:52function that basically handling uh based on event so there is a listener on EV but I'm not speaking about this this one I'm sorry I'm not speaking about this one I'm more speaking about the one in the content script okay for the panels yeah so I'm speaking first like about the listener on the on the link so it's

39:14uh yeah this one so so this is like the mo the mouseover line for8 um so you you you listen like on on the link and when you over link uh it calls uh the background of the

39:32extension which is uh in the panel GS you just show before and in order to do this The Way Extension are working is they are communicating with a message system and so in the line 49 uh it does like brother runtime send message and what I do is like I send a message with the file name I want to uh read the data

39:57from like the parket schema from and on

40:01the other side so on the panel side that there is then a listener that listen for this message and then do like theb magic uh to get all the buket information and to get the buet information actually I run a query on it so if you go on the panel yeah yeah in W yeah and that's like what we we can try

40:24and S the the rest is basically this is mostly just the panel that uh that you see on the screenshots right on the on the demo here um and so so yeah so this

40:36is the the The Listener of the mouse uh and then we have basically the the logic here um which uh basically load the uh

40:48the DB wasm uh instantiates uh the ddb

40:53database create a connection here we just Lo in the the the Doug DB version to say this is working we we'll see directly after when we load the extension conso log of five Fox um then

41:08we set the the credential the endpoint as I said for uh Google Cloud Storage uh you don't need to put those one if your query on uh S3 and um and this is if you need to put

41:25some uh some credential on private bucket here the bucket is is public so we don't need to put any credential um

41:36and uh here there is a small function uh to un all to send the query uh there is the The Listener on the mouse over and then on the mouse over there is the function and so this is the thing that I changed a bit because uh Kristof is like what the hell is this but it's just

41:56no no getting the file name it was actually uh just uh just missing something um the bracket and then we I'm breaking out the URL of the bucket um and so basically when you are on a uh on Google Cloud Storage you have your url that look like that and actually I can't uh let me share

42:23that directly uh so if I do go into

42:28console developer and then uh cloud storage

42:38whoops so you see here I am on the bucket in question and you can see that the URL is built that uh you have actually uh part of the URL you have the bucket name so this is what I'm inferring here uh basically to build uh the full path and then basically I'm doing the query so this is kind of like the same thing

43:02that I would do from the CLI I was just doing before and uh actually I still have the CLI open so we can run uh this query uh here sorry I have the sharing I cannot

43:18see I have the little sharing things which is blocking uh I cannot even help you because there is like a popup on the on the asset so I cannot see even like the C oh yeah yeah sorry yeah I'm going to hide this one yep I you are still

43:36seeing the the the the shart and it's

43:40for it's super nice because G to put it there all right so uh basically it's uh

43:48this path right and so I have an extra uh

43:56I'm going to I'm going to go there but basically this function is pretty similar to the describe table but here we getting uh really uh par type right

44:08this is uh dougb type but and we have more flexibility but but roughly they're they're the same another function which is I want to share with you um is uh

44:25summarize uh which is prettyy cool because it give you statistic directly yeah on uh the different field uh what's the minimum maximum average and so on and so you could actually include that in your panel but it require a bit more compute right but it's uh not really because some of the data is already like in the

44:50in the bucket N I guess yes indeed yeah

44:55uh some of them but I believe not not everything um that's actually a good question um I can come back on that anyway so because we have nine minutes left and we need to uh we mostly cover everything on the code um I think so yeah the the the query is there we get the the file path and and that's mostly

45:18it so you see at the end there is not that much there is a this uh two file B

45:25basically right yes yeah for for for for this extension you have your manifest the Manifest is just specifying where uh the different things here and the permission on uh uh for for security reason so now if we launch uh Firefox um not now no I'm sorry Firefox

45:50not yet I'm not yet ready for that um I'm using Brave by the way uh so you

45:58can what you could do is that if you go to um about uh and debugging and then

46:06you can load temporary uh add-on so that's what we're going to do and so here this is my repository you can pick any file pick the Manifest and now you have uh your extension uh which is load uh for this session all right um we're going to also open the inspect uh I'm going to put it there uh so that will help us to debug

46:33if there is uh a beautiful failure in the demo which obviously there is going to be because it's a it's a live stream and you can roast me in the comment if it's not working right yeah but it's not working it's my fault no worries no because I got time to basically make it worse or make it

46:53better yeah we'll see anyway so um I'm on the page uh and here you see what's going what's happening is actually uh I don't know if I can zoom that uh perfect um is the console log that we

47:09saw before with the DB version right and

47:13um and this is you know setting that is we are requiring a Google Cloud Storage it's a public bucket so we don't need any extra credential and what I'm going to do I think I just need to reload uh here and reload the page because probably it's not detecting it and it's not working so as

47:39expecting yes yes it's working uh so was just some uh some small event so you see every time I'm I'm uh offing over things what I'm going is that is building the path of my file so if I have another here that would work to any file but here is building okay this is the bucket this is the file and and actually let me

48:05let me do that uh quickly what you want to do oh my God it's been a time you want to duplicate like the F no I'm just gonna uh add

48:21another file

48:29and just while you are looking for file there is something very interesting which is when you get like the metad data because it's a parket file it only reads the metad data so it consume no

48:44bandwidth um and so that's what we want actually like yeah we we just see the schema and we we don't consume any bandwidth yeah I'm looking for uh okay this one is

49:01big I don't have a I don't have other Park file uh here I could uh convert it

49:10uh one day uh data I don't know what's this data I have no idea what it is we're gonna see because sensitive data

49:22yeah so uh let's see

49:29uh I think I need to the event page was not looks like no it's okay boom ah I know is this it's a pat PPI data so you have for PPI so the

49:47python Library package um and probably have like one day of data from a specific package but so I just wanted to see like okay so now there is multiple file and I just over and I have you know this uh this working that's that's pretty neat so everything what it does as as you said is just you see doing

50:07different uh queries behind the scene uh to Doug DB wasm and it's super light super fast so yeah that's

50:16roughly uh that so if you want to try it you can go on the G up repository uh play around so don't forget you need to uh I I'll add to the to the read me because I'm a good man um I I'll I'll add directly but basically on on Firefox you go to De bagging load uh the extension pick any

50:41files from the repository um and then have your your your inspector next to it so that you can restart uh the the load of the extension and and see what uh what's going on but yeah uh pretty NE

50:56use uh really again uh thank you

51:00Kristoff for for for the ID um what what do you want to do next with that one I guess the the next step is like uh finding sometimes to just uh pushing it like to the production I would say in quote like just putting the extension on the store maybe adding like a feature where when you click on the extension

51:22econ you can add your secret key and um access key so yeah I was thinky and the access key is not in the code um and I guess make it work for S3 as well as GCS because I guess a lot of people are S3 user as cloud storage um and then maybe as well do it

51:45for Google Chrome because maybe people are more using Google Chrome than Firefox and I think that would be uh

51:55enough for the extension to live for a few months years I don't know until people um really use it I don't know I I don't yeah it's just like no until until people implement it directly I mean that's like that's an obvious use case for Google right please Google update your uis that we don't need to build

52:17extension yeah that's true but to be honest I started like to to build an extension when I was doing stuff on DBT and now that I know how to build extension every time I have an issue like on on the website I'm like okay I'm going to do an extension to fix my issue yeah that's uh that's maybe uh

52:37maybe a bit too much but uh yeah I get I get the point I think it's uh there is I mean the one I'm using is actually I'm doing the black team uh what else and and other stuff but that not actually that that much ah yeah um uh weizer that

52:57uh enable you to inspect the kind of framework and programming language that that website is using when I see a beautiful website I'm like what are they using and so yeah so this is okay this is like the kind of extension with Bill actually just a nerd thing to understand what's behind the scene anyway we are closing uh closing in uh Christof thank

53:19you so much uh for your time and uh your codes uh I was really just bothering you with that Tre me but it's fine uh I have the fork in the link uh I put I will put this one for now because it's just a bit cleaner but I'm happy to uh Fork it back to your main repo so let's take it a

53:40think and uh have a beautiful day viewer

53:44whatever you are or uh a good night if you're also a good night for us yeah in European Time it started to be 10: in the evening so uh a bit uh late see you see you thanks

FAQS

What is WebAssembly (WASM) and how does DuckDB use it for analytics?

WebAssembly (WASM) lets high-performance code written in languages like C++ run in web browsers as a sandboxed virtual machine. DuckDB compiles to WASM, so the entire analytical database engine can run directly in your browser with no server needed. This opens up use cases like interactive dashboards (as seen in tools like Evidence), browser extensions for inspecting Parquet file schemas on cloud storage, and local data exploration, all with zero server-side compute. Learn more about DuckDB WASM for web data exploration.

How can DuckDB WASM be used in a browser extension to inspect Parquet files?

The video demonstrates a Firefox browser extension that uses DuckDB WASM to inspect Parquet file schemas directly from Google Cloud Storage. When you hover over a Parquet file link in the GCS console, the extension uses DuckDB WASM running in the browser to read only the file's metadata (consuming no bandwidth for the actual data) and displays the schema in a popup panel. This removes the need to download files, open a Python virtual environment, and use pandas just to check a file's schema.

What are the benefits of running analytics locally in the browser with WASM?

Running analytics in the browser via WASM has several practical benefits: zero latency since no server round-trips are needed, no installation or setup for end users, reduced cloud compute costs since processing happens on the client's machine, and better data privacy since data can be decrypted and processed locally without leaving the browser. Modern laptops have significant CPU and memory resources that go largely unused, making client-side analytics increasingly practical.

How do you query files on S3 or Google Cloud Storage using DuckDB?

DuckDB can query files on object storage (S3, GCS, Cloudflare R2) using the HTTPFS extension, which auto-loads when you reference an S3 path. For public buckets, no credentials are needed. Just use SELECT * FROM 's3://bucket/file.parquet'. For Google Cloud Storage, you need to set the S3 endpoint to the GCS endpoint and configure credentials. DuckDB auto-detects the file format from the extension, so you don't need to explicitly call read_parquet(). The friendly SQL syntax handles it for you.

Related Videos

"The MCP Sessions - Vol 2: Supply Chain Analytics" video thumbnail

2026-01-21

The MCP Sessions - Vol 2: Supply Chain Analytics

Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!

Stream

AI, ML and LLMs

MotherDuck Features

SQL

BI & Visualization

Tutorial

" The MCP Sessions Vol. 1: Sports Analytics" video thumbnail

2026-01-13

The MCP Sessions Vol. 1: Sports Analytics

Watch us dive into NFL playoff odds and PGA Tour stats using using MotherDuck's MCP server with Claude. See how to analyze data, build visualizations, and iterate on insights in real-time using natural language queries and DuckDB.

AI, ML and LLMs

SQL

MotherDuck Features

Tutorial

BI & Visualization

Ecosystem

"LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics" video thumbnail

2025-11-19

LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics

LLMs excel at natural language understanding but struggle with factual accuracy when aggregating business data. Ryan Boyd explores the architectural patterns needed to make LLMs work effectively alongside analytics databases.

AI, ML and LLMs

MotherDuck Features

SQL

Talk

Python

BI & Visualization