DuckDB WASM: Run SQL Analytics Directly in the Browser
2024/01/19Featuring:TL;DR: Explore WebAssembly (Wasm) use cases for analytics, see DuckDB running in the browser, and build a Firefox extension that displays Parquet file schemas using DuckDB-Wasm.
What is WebAssembly?
WebAssembly (Wasm) is a binary instruction format that enables running code written in languages like C++, Rust, and Go in web browsers. First released in 2017, it's now powering applications like Figma, Photoshop Web, and Disney+.
Key concepts:
- Sandboxed execution: Code runs in an isolated environment
- Near-native performance: Much faster than JavaScript for compute-heavy tasks
- Portable: Runs in browsers but also in Docker containers and edge environments
Why Wasm Matters for Analytics
Modern laptops have incredible computing power that's often underutilized:
- A MacBook Air today may have more CPU and memory than many cloud servers
- Running analytics locally eliminates network latency and reduces cloud costs
- Parquet files enable efficient column selection and predicate pushdown over the network
Combining Wasm with DuckDB means:
- Zero installation: Users can run SQL queries directly in their browser
- Local compute: Process data on the client without server round-trips
- Privacy: Sensitive data never leaves the user's machine
Real-World Examples
- Evidence.dev: BI dashboards using DuckDB-Wasm for instant filtering and aggregation
- TensorFlow.js: Train ML models in the browser using WebGPU for GPU acceleration
- Docker + Wasm: Run Wasm containers alongside traditional containers
Demo: Parquet Schema Browser Extension
Christophe Blefari built a Firefox extension that displays Parquet file schemas when hovering over files in Google Cloud Storage:
The Problem: Checking a Parquet schema traditionally requires:
- Download the file (possibly gigabytes)
- Start a Python environment
- Run pandas/pyarrow to read the schema
The Solution: A browser extension using DuckDB-Wasm that:
- Listens for mouseover events on GCS file links
- Sends a message to the extension's background script
- DuckDB-Wasm reads only the Parquet metadata (no full download)
- Displays the schema in a popup panel
Copy code
// Initialize DuckDB-Wasm
const db = await duckdb.AsyncDuckDB.instantiate();
const conn = await db.connect();
// Query Parquet schema without downloading full file
const result = await conn.query(`
SELECT * FROM parquet_schema('gs://bucket/file.parquet')
`);
Key Advantages
- Instant schema inspection: No file downloads, just metadata
- Zero bandwidth for schema checks: Parquet stores schema in footer
- Works with any cloud storage: S3, GCS, Azure Blob (with credentials)
The Future
- WebGPU: Direct GPU access from browsers for ML training
- Decentralized analytics: Query encrypted data locally with keys stored client-side
- Browser-based data apps: Full analytical applications without backend infrastructure
Transcript
0:00anyway for this session I would like you to dream uh it's session about wasm and
0:08also dctb if you've heard about wasm you cannot know what it is but you don't really know it's use sces of analytics uh and especially what ddb has to bring within the wasm ecosystem then this session is for you um so easily the Quack and code session is happening every other week um so this week we have
0:29also a guest Kristoff that's going to join us uh quickly and basically the goal is to have an informal chat about the topic the technical topic and then basically dive into codes uh so we will have a small demo and you can also uh feel free to uh basically code along and
0:49uh if you're live uh please say hi in the comment and tell me where you're coming from and if it's also uh super cold on your s so is there people over there yes I see some comment
1:12coming and so we are also uh live streaming from Twitter I never actually did that so this is also a test uh but we are on LinkedIn uh YouTube okay Jordan from Florida hi yeah I know I know it's it's um it's just that it's pretty it's pretty cold I get I get easily easily called uh so so
1:40yeah that's that's the thing um someone is asking what's the's hat uh I have too
1:48many hats as uh as people can tell there is a there is a a legend that says you know how many hats I have by uh the time you've been working with me because get to see them and condemn over time anyway um I welcome my guest uh I don't know I did a lot of things wrong for this live
2:08stream and I see the overlay background is not right this is better and there we
2:16have uh Kristoff welcome thank you for the invitation I'm so happy to be here and no worries like mistakes happen every time like yeah so
2:28so the thing what happen happen exactly if you want to know is that if you click on the on the on the live stream event and you stream and you stop the event because I was like oh let's not go live right now then you basically just consider these event is closed and there is no reason why you should go back
2:48online and so basically the event created mightbe people might confused but I remember I did this once with streamyard like yeah iect it up as as well once so happens yeah so we have also someone from tul so you have some French friends right because kristo is French I'm not French by the way uh I do have a French accent but please uh not
3:15everybody that speak French is French right be careful with your [Laughter] shortcuts but Kristof please uh introduce a bit yourself uh for for the audience for people that actually doesn't know you so hey people um so I'm Kristoff
3:37blar I'm French um I've been graduated
3:43years ago uh in software engineering but since I've started working I'm mainly doing data engineering uh I would say that I'm a data engineer softare engineer I don't care actually but I do that engineering mainly um I'm pass you next made about a lot of things I would say but in my free time I mainly do like video
4:05games um I bike I as well started back running like in August last year so I run a lot like
4:17four times a week something like this and yeah yeah so right now so I'm in Paris so it's not cold it's just rainy but um
4:31right now I have an excuse because I got a tund and so I cannot run any more so in the last 10 days I didn't I didn't don't you don't need to tell me an excuse it's it's fine I I would I wouldn't run with it it's it's too it's too cold I mean it's it's even dangerous like actually
4:50this is like the the Berlin proverb that there is no cold there just like bad clothes yeah indeed so maybe that's that's why we're I'm wearing this hat but anyway and also what you didn't mention like Kristoff is writing an amazing newsletter uh which is called the data news and that's actually how we get to know each other I think initially right
5:14because we were both writing uh datering content uh really great news later um you spent a lot of time on that and it's a bit on a break can you tell us a bit more about that yes um so you you can
5:29find my us on my uh personal website SL blog it's on B.F I guess you have like maybe up as well the assets so we have no name but yeah so it's my my my myo um and uh the newsletter is weekly
5:47uh I started it again like two weeks ago so it's not on break right now it was on break on December um and the idea of the newsleter is to do a curation about everything that is going on in the data ecosystem so I I create articles in data
6:07engineering analytics engineering um data science and data
6:16analysis and the idea is like just to give to people like the best articles that have been written in the last week with uh spicy opinion and with my French sasm which is something that people like actually this is like one of the the the the feedback I get the most which is like yeah we like your tone and and your
6:41French sarcasm so yeah that's something natural but uh I continue to do it uh
6:49like this cool no but uh it's defitely
6:53really good written um so I strongly recommend it I'm actually not following that much uh I would say uh data engering newslet but I think one you is definitely one I would uh I would recommend specifically just one question do do you follow it like by email like you receive the email or you have like RSS
7:18fe uh I do I have an email for uh forward I use read wise you know basically it's managing U helps you
7:29manage your newsletter that you receive by mail and I have also RSS feed there but I think yours is just connected to to my email that's your question yeah we have people from uh sorry uh from Brazil just because it's uh you know some people are are bragging about the their temperature 25 degrees Celsius that's quite cool for
7:54a standar yeah all right uh anyway was
7:59let's let's get into it because we already a bit uh late with the agenda um so who actually knows like in the audience like what was my stand for I'm
8:11curious like where does actually people heard of it for the first time can you can you tell us like Kristoff where where did you Hur was M for the first time probably related to video game actually like um you know I stopped it recently but at some point I was like uh on AC news like in my life
8:35I was on AC news like every two hours like it was like to them I was wasting so many so many hours on it and you know like uh all the the translation of old video games into the browser in wasm like Doom um
8:55Mario and stuff every time they do like the front page of I can use and maybe that's how I knew about it the first time maybe five years ago something like this like on the first translation with doom stuff like this yeah yeah indeed so
9:13wasm stand actually for web assembly for
9:17people that doesn't know and uh and basically it's mostly used for speeding up web application uh by allowing them to run code reading in multiple language usually it's run into lot of a language like C++ C or rest um there is other also interface and as I think goang too I've seen that uh that's that's that's the most common um but
9:46yeah it's actually started do you know when when it's gonna when it started I would say it started like with maybe the HTML 5 like specification so I would say 2005 something like this 2007 I don't know I didn't look oh yeah uh way way way later actually the first
10:08version was uh actually in 2017 so it's only okay six
10:16years ago yeah I think it's probably the ga uh you know those kind of things uh
10:23take times to like they get a lot of intention before getting ga uh so that's why probably you heard it uh before like the same with HTML 5 um so that that that's kind of thing um but the thing so I have a couple of U so we understand so far that uh wasm is basically a way to
10:47uh run something in the browser with other type of language it's pretty lightwe um I have a couple of use case here I want to discuss uh let's dive into I have a couple of article just one question maybe I I'm I'm mistaken but uh
11:04wasm to me is not a way to run something in the browser in another languages is more like a port of something yeah it depends but it's like a a part of another languages in the brother so like uh when you have like a specific Library it has been like translated into like compiled not translated but compile into JavaScript within the Brer
11:32so it emulates kind of your um language
11:37Behavior but actually it runs in JavaScript not in the in the language you want to run it I guess maybe I'm mistaken but um on the target yeah on the target so I'm not JavaScript neither or wi expert uh the way I understand it is that it's basically a mini VM that's run in your browser uh there's there is Javascript
11:59interface to the wasma compiler but wasm can be run actually somewhere else on edge or uh in a container and so this is
12:10actually one of my article I want to share um let me grab that uh so for
12:16people that's been using um
12:21Docker let me share my
12:27screen um so yeah so Docker has introduced actually the usage of was mean technical preview um that was when that was that was a couple months ago okay see it dates uh uh let me zoom out and um and basically they they explain actually um why what is wasm um
12:57so they mostly said that explain it that's it's running to enable you to run in different language in a sandbox environment and they give you also example so figma that I know is a big user uh of was Photoshop it's like
13:14the because Photoshop has a web version uh I believe um and and Disney
13:21plus apparently so there is a lot of company already using it so the point is that indeed it's only it's not only running in our browser but this is where we mostly see a lot of improvement right because we have a lot of uh AV application like uh figma is is one of them right um so that's a that's a way
13:41basically to speed up things and so it's really interesting you can go over there not going to go over the detail here um but basically they explain how they integrated with the the docker engine and what they they they they did it and you can also so start to play it and basically run uh a wasm container in a
14:04in a specific language uh so really looking forward where this um this GNA
14:11head I don't know if you heard of that or not before no no no but looks uh yeah I have so many ideas with this I'm curious if any of the audience have heard of that because uh so article is uh I cannot figure out it's still it's it's a couple month okay so the DAT is there um so yeah it's still it's
14:33already pretty old it was in technical preview so I'm not sure what's the the state right now but it's a year behind maybe they dropped the idea for for a different reason uh but the point is that there there is a lot of uh exploration around this so another thing uh I want to talk and it's probably I
14:53mean you talked about that that you heard was um around uh video game
15:00topic um and so this is have you heard
15:04of web GPU yeah I've seen this I didn't read it but I have it like in my book Yeah so basically webgl is the old standard uh to access GPU capabilities within your browser um and basically uh
15:26what web GPU does does is uh tapping you
15:32basically enable you to tap you directly um into more low level of your gpus and why is that uh related to wasm
15:43uh is because if your Brer have better interface to your graphic cards for let's say training models and if was enable you to have sendbox environment to have uh enable you to train a model within just a browser URL so everything is running locally not on the server side and so there is also tens of uh. GS that exist
16:09right so I guess some people hopefully we have data people um on live here but
16:17basically the denor for. GS is uh is a web
16:24assembly version to uh be able to run test and flow on the Ed so you would train or inference a model in your browser on the edge so you can imagine that within a newurl someone gets basically all the package needed to train a model so without installing anything and on top of that we've that's where I'm doing the link with web GPU it
16:47could tap directly into your Hardware uh
16:51GPU um and so that's basically lowered the technical barrier to entry for for a lot of application right um because today if you want to train your model classically and using the full power of GPU you you're gonna run python locally and so on and TR something does that talk to you or not Christof yeah it talks to me but uh I
17:14guess like the the main limitation on this like accessing because there there have been always like a huge debate in the HTML Community from what I've seen
17:28around the access of the Computer Resources you know and
17:35having like a tab that has access to your GPU or kind of seem seems freaking to me like I'm
17:46not uh I mean I mean the browser T is already eating all your your CPU so why not give it them away the GPU but anyway also there is all everything related to games you were mentioning with doom and so on but I think we we GNA see a lot of like really powerful game just like 3D
18:05game running in the browser so uh I'm really excited about that I think this is really still early web GPU was released I think uh middle next year by Chrome and so you have time that other browser also support this but this is going clearly there is a lot of signal that it's going into uh into that
18:26direction so um that's for kind of the introduction around wasman the use case no let's narrow it down to there is something just to sorry to interrupt you but just toad maybe in in the concept like there is something that has been made uh the
18:47the reason like web asly is going to be a thing in the future is that at the moment our like laptops and computer and just our work laptops have like a lot of resources lot of capabilities so we we we get more power than before like in our um home computers or work computers
19:08and so that that's crazy because we don't need a server like we needed before because for a lot of people for instance when you have like a M1 MacBook uh maybe your MacBook has more like CPU and memory than the server you develop on like the remote server you develop on so it's crazy because it means that we
19:29can do so many things locally no that's true thanks for for the reminder is that everything that basically enable you to do some compute locally and within your browser within without installing anything and tapping interor GPU is because our current machine are way stronger and uh we are mostly using it for browsing like if you look like just the basic MacBook Air uh
19:55now which is like not that expensive if you compare like 10 years ago um and the power that this machine has is just is just crazy um so I think we are just scratching the surface like at the moment it's not this compute is not is not really leverage and um there's our path to to go there I have a question
20:18here on YouTube are gpus well suited for the kind of workload of dougb has well not really we're not leveraging um uh gpus um but but uh as far as I know it I
20:31may say some St something stupid it's more analytics workfl workload I think uh GPS are way more interesting if you need to train model uh that's like the classic use case uh at least on the data site let's narrow it down to now we talked about General use case on was um web GPU and so on uh we did discuss one
20:56uh data use case with t Flo um but regarding analytics so what's your what's your your thoughts today on wasb and analytics I mean apart from Doug DB and then we dive directly in Doug DB um I think it relates like to just to what I just said actually like the fact
21:23that in in in our browser we can access
21:27to our local MacBook M1 M2 M3 uh infinite when
21:35I say infinite it's compared to like a a small machine we run on E2 infinite infinite power um opens the door like to to decentralize I would say like maybe the word is this one to decentralization in term of analytics I mean at the moment what what we have if we look at data platforms we have like
22:01two ways I would say we have data warehouses and we have data legs for instance Lous to me is like a data Lake because it's just like a vocabulary but in the end it's like files on on on on
22:14blob or on cloud stes so you have like dat and and and warehouses and the issue with those two is that you have uh in the cloud to pay for some uh for some comput if uh we find a way to be like clever with the usage of the analytics in the brother with wasn't I guess companies
22:40can save a lot of money in term of like um data data compute actually because
22:48yeah companies have like good computers good laptops maybe good like I I don't know but they have like good stuff and if you are clever in the way you access data in the way you ask the query you can sometimes avoid like querying like data breaks not like be quy or yeah maybe mzc but you you can find a clever
23:12way like to avoid calling like the remote data platform and um and doing the compute locally and that's crazy like to to to think about it and to to see that it opens like many do I guess yeah but there is a challenge right the architecture is not so obvious because then you're like yeah okay but my data
23:36is going to be centralized somewhere and you know we put so much effort to put it in the cloud the goal is not to put all your data locally um but and how to to
23:47avoid uh you know uh Network traffic and so on um and regarding that so actually
23:55let's let's dive into uh Doug DB so there is a couple of use case today um I can name um for example there there is multiple uh notebook style um data tool
24:09or other type of Tool uh so there is like for example uh let me
24:18share my screen why can I not so here for example um evidence uh evidence uh is basically
24:29a way to create dashboards with just SQL and markdown and they they uh it's a it's a JavaScript framework behind the scene and now they're using uh Doug DB wasm which means that if you need to do some filtering and so on activity uh you can see uh it here basically what it does behind the scene is that there is
24:53duck DB which is running in your browser locally and executing those query and displaying the result um to uh to you so that's why it is super Snappy right and the classic architecture that we have today uh regarding data vistion tool is that they often uh rely on server sometimes they're going to cach some result and some query but they mostly
25:17cach it they don't compute uh things uh things locally you have other use case like that in mind that you've seen um I guess that's one of the the
25:30most obuse one I guess um and just this one to to to go deeper it requires maybe
25:37for for instance to make it um work easily and not to get lost into a lot of data sets I guess you you need to have like one mark with one table or like one one big table like the OBT to to to to have not like a big mess because if in your dashboard you are creating like a
25:56lot of different data sets I guess it's G to be a lot of like Network calls and stuff but yeah I like this use case and I I had one use case like in mind regarding like this decentralization and the fact that you can use like was the DB uh in in the browser which is Imagine
26:16like a I don't know if you know a bit about differential priv privacy but the idea of differential privacy is like to do statistic and to do like comput cap Computing on top of data that has been ciphered in order like to um to keep anonymization and in privacy and stuff um there there might be a use case in
26:42data decentralization where like in the in the central repository in the warehouse house dat L you have everything that is like um ciphered and
26:52the only place where the stuff is ciphered is on the browser like the the the client like like reading the data is coming with like his ding key like with
27:03with the key and you don't have the key if you don't have like the the token you could tyght you could tyght a specific Hardware to yes can can
27:15you have yeah that's fine yeah yeah no so the the in term of security too yeah there is like like you mentioned uh but again it's like okay how do you minimize the traffic and so on and so just like also a small plug mother de released their research paper
27:36uh from cider um and just released a
27:42plog about it uh if you go to mdu.com blog um basically talks about the ab compute mode where we try to leverage as much as possible to make it simple and simple words yeah your local compute or the cloud in concert but the thing I'm pointing here is that there is a lot of research that needs to to be done
28:04basically still uh it's still like they we see the possibilities but um product are are really just uh starting and that's what I said at the beginning of the session right I want you to dream to walk away from this session and to dream so um I think there is a lot of things to to dream about do you have other
28:25other ideas in mind uh um I don't know but there is just something that unlocks like the all the networks and avoid a lot of mess in the network which is actually par the par parket in English I don't know how to say it um but like parket files
28:47actually are the key actually to to save like bandwith to save uh a lot of stuff because like you can do like the the colon selection the push down predicate like for the wear and everything that is related thanks to baret um I guess that's something that going to change in the in the in the following years um and
29:11maybe one day we're going to get rid of csvs everywhere and stuff like this but we can't we can't it's like a it's like uh Excel you know it's I think it's it's to stay there uh I guess it's a bit different but I guess we we we have to get rid of CSV like in the inter
29:32intermediat part because like on the source part yeah okay I don't care but like in the middle we we have to get rid of it yeah cool um we are going now to uh
29:45dive in something a bit more uh Endz on and so uh Kristoff you show so Christoff show a demo a really neat demo at at a Meetup uh and that's like that's why also I wanted to to cover that topic it's using ddbm um it's not uh for dashboard
30:07responsiveness as I as I show for for for thees another use case that that I found really interesting um and it's it's simple to do so it's uh it's a Firefox extension um I'm going to show you the demo first but first uh so Kristoff sent me a link with codes so and he say this
30:30is the current status of the code and I'm like yeah sure so you know without read me so it's kind of a challenge um do you do I hope you don't handle your code at your clients like that uh no it depends actually [Laughter] but no actually I know it's a Dr but um to be honest um I don't like to R Ries
30:58uh but yeah that's a challenge for me like to to explain my code and explain the stuff I read Because yeah I know
31:08this is one of my issue I'll be uh sharing right now
31:16uh but this one for for my defense this one is like an experiment uh and uh I
31:25send it to you like just row uh I have like to because like I was not ready like to put it like in open for people uh because if I if I had to put it like open I would have like commented stuff and added like more documentation but for the live I send it to you Ro because
31:47because of lack of time it's fine it's fine I like the challenge I like the challenge all right so if you want you can um clone the repository uh and uh and so as I said
32:03it's uh it's an extension uh a Firefox extension and what it does is that if you use Google Cloud Storage you can we can also make it easily work for uh S3 storage but basically what it does it's that it's popping up a small window when you're overing uh around the file and it's showing you uh the schema of that file
32:29and I'm guessing maybe I'm the only one but please don't leave me alone in the audience in the chat tell me how many times did you download a parket just to check up you know the schema or whatsoever tell me I'm really curious and and and and Kristoff did you have this use case where you just need to
32:49check par of schema what do you do yeah I I I
32:56I had so many times that that was the reason I developed this stuff and I had decided because if you just think about it um when you are in the data Lake and you decide like to go for par or or like any binary format um every time you just want to see your file you have to download the
33:17file go into python virtual start a jupyter notebook import pandas pandas do read packet and stuff like it was before like the and all the the
33:32that was like the journey so and the issue with this is that okay you click on download but the file is like one gigabyte and you have like a bad connection so you go for a coffee and and and then your your day is f and you for you forgot this and yeah this is like that that that yeah
33:54it's it's the reason I no it's a it's it's a it's a really common common thing and you might say also oh but you know we have you know a SQL inine would it be a Tina we have a blue catalog so we can query it directly from there but sometimes you have a pipeline that fail you have a wrong parket file s parket
34:15file that you identify and you cannot actually query it because it's messing up the schema the SQL engine they applying the schema on read right and so the problem with that is that usually you create your table based on the right schema and so if you have a small parket file within all your Park file that has
34:33a wrong schema you're basically have in a problem where you have to identify this one and so just to show uh quickly you say of course that was a preu DB era but if you run um just ddb CLI so and M OS
34:49of course so you can uh install whoop you can install the just write this right um personally iend people to use pip to install rather than but this is
35:03my but this is for the C you point out after the CLI is not uh is not is not really available as far as I know with the python package yes you can with the python package okay you can point it out directly but you need to add the path and why why why would I care about that
35:20it's a one like command oh you're right oh you're right okay um so once you once you install um
35:29so here it's a bit specific uh it's a bit hacky for Google CL storage why because you need to set up uh the endpoint to say that it's a Google CL storage and you see oh I'm setting S3 endpoint this is initially because ddb was supporting his3 and Google storage came along um believe they do some work
35:50around secrets so we're going to have dedicate Secrets based on the source will it be on S3 or Google class storage uh and I guess for the configuration as well I'm not sure about that but basically for the moment you just need to set the that and then your credential um and um here it's a public bucket so
36:12you can actually directly uh read it so this is the address of the public bucket uh you can go and try it um so if I do
36:23this I basically directly can query the the data from from a stre so this is really magic right there is nothing I need to do and I can also um
36:36describe uh the table if I want just uh uh the schema this is the schema within the duct DB type and we're going to see right now what's the what's the parket type but you see like that's that's super fast um the only thing I I would have need to do uh is set up the quential and and note that if you using
36:57AWS there is also the AWS extension so DWS extension and you can uh load directly uh your ads credential uh so you don't need to to set it up anything and directly C start cing your bucket that's magic uh of course there is always uh netor band with right so if you target uh uh every um Park file it's
37:22exactly what you mentioned Kristoff be mindful maybe on the colon selection on uh on your partition to prune a certain type of parket U yada yada going
37:35back to our extension so what is happening here what is happening here Kristoff please good question like a lot of things actually um I don't know if it was your first um Brer extension but uh actually uh yeah
37:58without read but I send you like the the tutorial from Firefox to understand yes
38:05indeed uh still um yeah in the web extension like the stuff is a bit different than in a classic web page but what happens here mainly is that um in
38:17your main main web page there is like a listener uh on the example. bucket uh
38:25link um and when you over yeah so it's
38:30not in the panel yes I guess it's yeah it's here uh is it's here I did a bit of modification and by the way I'm happy to it's a fork I'm happy to contribute back and give back to Caesar what belongs to Caesar it was your idea uh but uh but yeah so there is a
38:52function that basically handling uh based on event so there is a listener on EV but I'm not speaking about this this one I'm sorry I'm not speaking about this one I'm more speaking about the one in the content script okay for the panels yeah so I'm speaking first like about the listener on the on the link so it's
39:14uh yeah this one so so this is like the mo the mouseover line for8 um so you you you listen like on on the link and when you over link uh it calls uh the background of the
39:32extension which is uh in the panel GS you just show before and in order to do this The Way Extension are working is they are communicating with a message system and so in the line 49 uh it does like brother runtime send message and what I do is like I send a message with the file name I want to uh read the data
39:57from like the parket schema from and on
40:01the other side so on the panel side that there is then a listener that listen for this message and then do like theb magic uh to get all the buket information and to get the buet information actually I run a query on it so if you go on the panel yeah yeah in W yeah and that's like what we we can try
40:24and S the the rest is basically this is mostly just the panel that uh that you see on the screenshots right on the on the demo here um and so so yeah so this
40:36is the the The Listener of the mouse uh and then we have basically the the logic here um which uh basically load the uh
40:48the DB wasm uh instantiates uh the ddb
40:53database create a connection here we just Lo in the the the Doug DB version to say this is working we we'll see directly after when we load the extension conso log of five Fox um then
41:08we set the the credential the endpoint as I said for uh Google Cloud Storage uh you don't need to put those one if your query on uh S3 and um and this is if you need to put
41:25some uh some credential on private bucket here the bucket is is public so we don't need to put any credential um
41:36and uh here there is a small function uh to un all to send the query uh there is the The Listener on the mouse over and then on the mouse over there is the function and so this is the thing that I changed a bit because uh Kristof is like what the hell is this but it's just
41:56no no getting the file name it was actually uh just uh just missing something um the bracket and then we I'm breaking out the URL of the bucket um and so basically when you are on a uh on Google Cloud Storage you have your url that look like that and actually I can't uh let me share
42:23that directly uh so if I do go into
42:28console developer and then uh cloud storage
42:38whoops so you see here I am on the bucket in question and you can see that the URL is built that uh you have actually uh part of the URL you have the bucket name so this is what I'm inferring here uh basically to build uh the full path and then basically I'm doing the query so this is kind of like the same thing
43:02that I would do from the CLI I was just doing before and uh actually I still have the CLI open so we can run uh this query uh here sorry I have the sharing I cannot
43:18see I have the little sharing things which is blocking uh I cannot even help you because there is like a popup on the on the asset so I cannot see even like the C oh yeah yeah sorry yeah I'm going to hide this one yep I you are still
43:36seeing the the the the shart and it's
43:40for it's super nice because G to put it there all right so uh basically it's uh
43:48this path right and so I have an extra uh
43:56I'm going to I'm going to go there but basically this function is pretty similar to the describe table but here we getting uh really uh par type right
44:08this is uh dougb type but and we have more flexibility but but roughly they're they're the same another function which is I want to share with you um is uh
44:25summarize uh which is prettyy cool because it give you statistic directly yeah on uh the different field uh what's the minimum maximum average and so on and so you could actually include that in your panel but it require a bit more compute right but it's uh not really because some of the data is already like in the
44:50in the bucket N I guess yes indeed yeah
44:55uh some of them but I believe not not everything um that's actually a good question um I can come back on that anyway so because we have nine minutes left and we need to uh we mostly cover everything on the code um I think so yeah the the the query is there we get the the file path and and that's mostly
45:18it so you see at the end there is not that much there is a this uh two file B
45:25basically right yes yeah for for for for this extension you have your manifest the Manifest is just specifying where uh the different things here and the permission on uh uh for for security reason so now if we launch uh Firefox um not now no I'm sorry Firefox
45:50not yet I'm not yet ready for that um I'm using Brave by the way uh so you
45:58can what you could do is that if you go to um about uh and debugging and then
46:06you can load temporary uh add-on so that's what we're going to do and so here this is my repository you can pick any file pick the Manifest and now you have uh your extension uh which is load uh for this session all right um we're going to also open the inspect uh I'm going to put it there uh so that will help us to debug
46:33if there is uh a beautiful failure in the demo which obviously there is going to be because it's a it's a live stream and you can roast me in the comment if it's not working right yeah but it's not working it's my fault no worries no because I got time to basically make it worse or make it
46:53better yeah we'll see anyway so um I'm on the page uh and here you see what's going what's happening is actually uh I don't know if I can zoom that uh perfect um is the console log that we
47:09saw before with the DB version right and
47:13um and this is you know setting that is we are requiring a Google Cloud Storage it's a public bucket so we don't need any extra credential and what I'm going to do I think I just need to reload uh here and reload the page because probably it's not detecting it and it's not working so as
47:39expecting yes yes it's working uh so was just some uh some small event so you see every time I'm I'm uh offing over things what I'm going is that is building the path of my file so if I have another here that would work to any file but here is building okay this is the bucket this is the file and and actually let me
48:05let me do that uh quickly what you want to do oh my God it's been a time you want to duplicate like the F no I'm just gonna uh add
48:21another file
48:29and just while you are looking for file there is something very interesting which is when you get like the metad data because it's a parket file it only reads the metad data so it consume no
48:44bandwidth um and so that's what we want actually like yeah we we just see the schema and we we don't consume any bandwidth yeah I'm looking for uh okay this one is
49:01big I don't have a I don't have other Park file uh here I could uh convert it
49:10uh one day uh data I don't know what's this data I have no idea what it is we're gonna see because sensitive data
49:22yeah so uh let's see
49:29uh I think I need to the event page was not looks like no it's okay boom ah I know is this it's a pat PPI data so you have for PPI so the
49:47python Library package um and probably have like one day of data from a specific package but so I just wanted to see like okay so now there is multiple file and I just over and I have you know this uh this working that's that's pretty neat so everything what it does as as you said is just you see doing
50:07different uh queries behind the scene uh to Doug DB wasm and it's super light super fast so yeah that's
50:16roughly uh that so if you want to try it you can go on the G up repository uh play around so don't forget you need to uh I I'll add to the to the read me because I'm a good man um I I'll I'll add directly but basically on on Firefox you go to De bagging load uh the extension pick any
50:41files from the repository um and then have your your your inspector next to it so that you can restart uh the the load of the extension and and see what uh what's going on but yeah uh pretty NE
50:56use uh really again uh thank you
51:00Kristoff for for for the ID um what what do you want to do next with that one I guess the the next step is like uh finding sometimes to just uh pushing it like to the production I would say in quote like just putting the extension on the store maybe adding like a feature where when you click on the extension
51:22econ you can add your secret key and um access key so yeah I was thinky and the access key is not in the code um and I guess make it work for S3 as well as GCS because I guess a lot of people are S3 user as cloud storage um and then maybe as well do it
51:45for Google Chrome because maybe people are more using Google Chrome than Firefox and I think that would be uh
51:55enough for the extension to live for a few months years I don't know until people um really use it I don't know I I don't yeah it's just like no until until people implement it directly I mean that's like that's an obvious use case for Google right please Google update your uis that we don't need to build
52:17extension yeah that's true but to be honest I started like to to build an extension when I was doing stuff on DBT and now that I know how to build extension every time I have an issue like on on the website I'm like okay I'm going to do an extension to fix my issue yeah that's uh that's maybe uh
52:37maybe a bit too much but uh yeah I get I get the point I think it's uh there is I mean the one I'm using is actually I'm doing the black team uh what else and and other stuff but that not actually that that much ah yeah um uh weizer that
52:57uh enable you to inspect the kind of framework and programming language that that website is using when I see a beautiful website I'm like what are they using and so yeah so this is okay this is like the kind of extension with Bill actually just a nerd thing to understand what's behind the scene anyway we are closing uh closing in uh Christof thank
53:19you so much uh for your time and uh your codes uh I was really just bothering you with that Tre me but it's fine uh I have the fork in the link uh I put I will put this one for now because it's just a bit cleaner but I'm happy to uh Fork it back to your main repo so let's take it a
53:40think and uh have a beautiful day viewer
53:44whatever you are or uh a good night if you're also a good night for us yeah in European Time it started to be 10: in the evening so uh a bit uh late see you see you thanks
FAQS
What is WebAssembly (WASM) and how does DuckDB use it for analytics?
WebAssembly (WASM) lets high-performance code written in languages like C++ run in web browsers as a sandboxed virtual machine. DuckDB compiles to WASM, so the entire analytical database engine can run directly in your browser with no server needed. This opens up use cases like interactive dashboards (as seen in tools like Evidence), browser extensions for inspecting Parquet file schemas on cloud storage, and local data exploration, all with zero server-side compute. Learn more about DuckDB WASM for web data exploration.
How can DuckDB WASM be used in a browser extension to inspect Parquet files?
The video demonstrates a Firefox browser extension that uses DuckDB WASM to inspect Parquet file schemas directly from Google Cloud Storage. When you hover over a Parquet file link in the GCS console, the extension uses DuckDB WASM running in the browser to read only the file's metadata (consuming no bandwidth for the actual data) and displays the schema in a popup panel. This removes the need to download files, open a Python virtual environment, and use pandas just to check a file's schema.
What are the benefits of running analytics locally in the browser with WASM?
Running analytics in the browser via WASM has several practical benefits: zero latency since no server round-trips are needed, no installation or setup for end users, reduced cloud compute costs since processing happens on the client's machine, and better data privacy since data can be decrypted and processed locally without leaving the browser. Modern laptops have significant CPU and memory resources that go largely unused, making client-side analytics increasingly practical.
How do you query files on S3 or Google Cloud Storage using DuckDB?
DuckDB can query files on object storage (S3, GCS, Cloudflare R2) using the HTTPFS extension, which auto-loads when you reference an S3 path. For public buckets, no credentials are needed. Just use SELECT * FROM 's3://bucket/file.parquet'. For Google Cloud Storage, you need to set the S3 endpoint to the GCS endpoint and configure credentials. DuckDB auto-detects the file format from the extension, so you don't need to explicitly call read_parquet(). The friendly SQL syntax handles it for you.
Related Videos
2026-01-21
The MCP Sessions - Vol 2: Supply Chain Analytics
Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!
Stream
AI, ML and LLMs
MotherDuck Features
SQL
BI & Visualization
Tutorial
2026-01-13
The MCP Sessions Vol. 1: Sports Analytics
Watch us dive into NFL playoff odds and PGA Tour stats using using MotherDuck's MCP server with Claude. See how to analyze data, build visualizations, and iterate on insights in real-time using natural language queries and DuckDB.
AI, ML and LLMs
SQL
MotherDuck Features
Tutorial
BI & Visualization
Ecosystem

2025-11-19
LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics
LLMs excel at natural language understanding but struggle with factual accuracy when aggregating business data. Ryan Boyd explores the architectural patterns needed to make LLMs work effectively alongside analytics databases.
AI, ML and LLMs
MotherDuck Features
SQL
Talk
Python
BI & Visualization


