YouTube BI & Visualization Quack & Code Ecosystem

DuckDB + Streamlit : Crafting Dynamic Dashboards and Data Apps

2024/02/15Featuring:

Data scientists and analysts often perform complex analyses in environments like Jupyter notebooks but face a significant challenge when it comes to sharing their findings. Moving from an insightful notebook to an interactive, accessible application traditionally requires knowledge of web frameworks, database administration, and complex deployment infrastructure. This gap creates a barrier between powerful analysis and a shareable, interactive product.

A modern, Python-native stack combining Streamlit for the user interface and DuckDB for the back-end data engine provides a powerful solution to this problem. This combination enables the rapid development of fast, self-contained, and interactive data applications, from simple dashboards to sophisticated AI-powered tools, without ever leaving the data scientist's core environment.

Why Streamlit is the Data Scientist's Choice for Web Apps

Streamlit has established itself as a go-to framework for turning Python scripts into interactive web applications. Its core value proposition lies in its simplicity. Data practitioners can build and share web apps using only Python, eliminating the need to learn front-end languages like HTML, CSS, or JavaScript. This allows them to focus on the data and logic rather than the complexities of web development.

Since its initial release in 2018, Streamlit has matured into a battle-tested framework with a robust ecosystem. Its rapid development cycle is a key feature, with a high velocity for releasing new features often requested by the community. It is common for a desired feature to appear in a new release within three to six months, a testament to the team's responsiveness. The flexibility of the framework is demonstrated by its diverse use cases, which range from interactive reports that allow stakeholders to explore data in real-time, to prototyping tools for machine learning models. It also excels at creating interfaces for Large Language Models (LLMs), such as chatbots or Text-to-SQL applications, and can even be used to build personal portfolios that showcase a data professional's skills in a dynamic format.

Ultimately, Streamlit empowers data scientists to move beyond static visualizations and reports, creating engaging tools that others can interact with using just a web browser.

Understanding DuckDB: The In-Process Engine for Analytics

While Streamlit handles the user interface, DuckDB serves as the ideal data engine for many applications. DuckDB is a fast, in-process Online Analytical Processing (OLAP) database. The "in-process" nature is a critical differentiator from traditional client-server databases like PostgreSQL or MySQL. Instead of running as a separate server that an application connects to, DuckDB runs directly within the Python process itself.

This architecture offers several key benefits in the context of building data apps:

No Server Management: There is no separate database server to install, configure, or maintain.
Portability: A DuckDB database can be stored as a single, portable file (e.g., stats.db). This file can be bundled directly with the Streamlit application, creating a completely self-contained tool that requires no external database connections. This is often described as making DuckDB "the SQLite for analytics."
Performance: DuckDB is highly optimized for analytical queries (e.g., aggregations, joins, and filters) and often outperforms libraries like Pandas for complex computations on large datasets.
Seamless Python Integration: DuckDB has excellent interoperability with Pandas DataFrames. It can ingest a DataFrame, perform high-speed SQL computations on it, and return the result as a new DataFrame, making it an ideal "middle compute" layer for existing Python data workflows.

Building a Self-Contained Dashboard with DuckDB and Streamlit

Putting the portability and performance of DuckDB into practice, developers can build a completely self-contained dashboard to analyze large datasets, such as PyPI (Python Package Index) download statistics.

The architecture is straightforward. The Streamlit script connects to a local DuckDB database file containing the download data. This single file can contain millions of rows but remains compact due to DuckDB's columnar storage format. When a user interacts with a widget in the app, such as a date range slider, Streamlit captures the selected values. These values are then used to dynamically construct a SQL query with a WHERE clause that filters the data. DuckDB executes this analytical query against the local file, aggregates the results, and returns a Pandas DataFrame, which Streamlit then visualizes using its built-in charting elements. This entire application, including the data, can be deployed to the Streamlit Community Cloud without needing to provision an external database.

When designing the user interface, a valuable best practice is to use the sidebar for global filters that affect the entire application, such as a date range selector. For more granular controls that apply only to a specific chart or table, it is better to place those widgets directly in the main panel next to the relevant visualization. This creates a more intuitive user experience.

Advanced Application: A Local Text-to-SQL App with Ollama

The combination of Streamlit and DuckDB can be elevated further by integrating local Large Language Models (LLMs) to create sophisticated AI applications. A compelling example is a Text-to-SQL application that allows users to query data using natural language. This architecture allows developers to build a fully local and private application, ensuring that sensitive data never leaves the user's machine.

The architecture has four key components:

Ollama: A tool for downloading and running open-source LLMs locally, similar to how Docker runs containers.
MotherDuck's Text-to-SQL Model: A specialized model available on Ollama that is fine-tuned to generate DuckDB-compatible SQL syntax.
DuckDB: The local engine that executes the SQL queries generated by the LLM against the local data file.
Streamlit: The front-end that provides a simple chat interface for the user to ask questions.

The workflow is seamless. A user enters a natural language question like "what were the total downloads per Python version?" into the Streamlit interface. To provide the necessary context for the LLM, the application first queries the DuckDB database to retrieve the table schema, including table names, column names, and data types. This schema is combined with the user's question into a detailed prompt. The application then passes this prompt to the locally running LLM, which generates the corresponding SQL query. DuckDB executes the query on the local database, and the results are returned as a DataFrame and displayed back to the user in Streamlit. This entire process happens in seconds, providing a powerful, conversational interface for data exploration without any reliance on cloud services.

The Future of Rapid Data Application Development

The combination of Streamlit, DuckDB, and local LLMs represents a powerful, local-first, and Python-native stack for modern data and AI applications. This approach empowers individual data practitioners to build and share sophisticated tools that were once the exclusive domain of large engineering teams.

By bundling data directly into a portable DuckDB file and building the UI with Streamlit, developers can create self-contained applications that are easy to share and deploy. This stack is ideal for rapid prototyping, building internal tools, and developing applications in environments where data privacy and security are paramount. It effectively closes the loop between data analysis and interactive application, enabling a faster, more efficient workflow for turning insights into action.

TABLE OF CONTENTS

Why Streamlit is the Data Scientist's Choice for Web Apps

Understanding DuckDB: The In-Process Engine for Analytics

Building a Self-Contained Dashboard with DuckDB and Streamlit

Advanced Application: A Local Text-to-SQL App with Ollama

The Future of Rapid Data Application Development

Transcript

0:00[Music]

0:29he

0:35[Music]

0:58he hello everybody uh welcome to another episode of uh quack and code so we it's

1:09actually the fourth episode already we've been doing that every other week and you can actually watch the uh previous episode on our uh YouTube channel and uh this in this episode we have uh someone actually from uh streamlit so we'll talk first uh uh of what is it stream L if you're new to streamlit or if you're an experienced

1:34user of streamlit uh we'll talk also about uh tug DB and how uh both work really well together uh and then we're gonna have some fun pragmatic fun because that's the B goal of the show uh is to get our hands on and so we'll you'll be able to also uh follow along uh we'll share the the kup repository

1:56and yeah we'll see where everything guide us uh this is is live coding so of course there will be failure because you know your coding skills is like dropping 70% where your life but that's totally fine it's part you know of the reality uh less scripts and more uh real on the job uh but say hi if you're uh joining

2:18us uh now on the chat uh yeah tell tell

2:22us where you're from what's the time for you uh I'm from Berlin it's uh actually just right the afternoon for me um but please uh say hi to everybody we have C from uh from

2:40YouTube great and I think uh people from uh LinkedIn is going to join LinkedIn is always doing a bit thing right so it takes always a bit time to uh to get over there anyway uh let's welcome our guest uh shennon from uh stream L is uh

2:59right there so let me bring in shanine how you doing hello hello

3:09yeah nice thanks for having me it's great to have you so we actually know each other uh from the YouTube live

3:18I would say right so you're also known

3:23as the data Professor on YouTube the YouTube channel is pretty big right how much subscriber do you have um 170

3:31something yeah that's that's pretty decent right 170,000 so I remember actually watching your uh video when I was just starting YouTube so it's really a pleasure for me uh to have you uh to have you here and you're working uh now at trim late right

3:50at uh snowflake on snowflake wait it's true that is that is uh that been acquired by snowflake can you uh maybe what you um what goes through to your past experience and you know work experience so far that's lead you up to you know developer Advocate at stream lead nlake right yeah so um yeah so my

4:16journey in teag is fairly new um it's been about almost three years now since like the end of uh 2021 uh when I joined streamlet um when it was a startup company um so so let me take you back prior to that um so I came

4:34from a biology major I originally wanted to do Premed um and become a doctor someday um so I graduated with a bachelor's of Science in biology and then I did my PhD in medical technology um like using on on using machine learning and also computational chemistry um in order to uh design drugs like for drug Discovery um so so yeah so

5:00I graduated PhD I became a professor at a university in Thailand called Maho University um for like 12 years um oh no

5:11actually 15 years 15 years since 2009 until 2006 so long 2006 until

5:212021 so yeah 15 years exactly um and then I joined Tech um when when strl was a uh startup company and in the following year tret was acquired by snowflake um and yeah so I've been working as a developer Advocate ever since the stom lit days and currently um as a senior developer Advocate over at the uh snowflake on the stom lit

5:45team wow that's amazing and does like your YouTube Journey like had to do something with your switch to developer advocacy yes yes exactly so I I would say like YouTube is kind of like an intermediary between Academia to Tech for me for my own Journey um so so when I was in Academia um like in

6:10about like prior to covid a little bit like 200 um 18 uh toward the end of the year yeah about August before covid hit in in December um I experimented with like you know like recording tutorial videos initially it was on R and data science um I think like the first 10 to 15 videos were primarily R uh viewers were

6:34asking why don't you make python tutorials andar video why don't you make python tutorials and so I was like okay why why not and then after that it became more like into python tutorials and then um

6:49and then I I discovered one of the viewer mentioned about strl as a web app uh framework I made a tutorial video um and then yeah I mean after 10 20 30

7:02videos later um I decided to join

7:07stret yeah nice yeah that's that's still a lot of uh video Journey right because that takes a lot of time but it's funny you mention R because like I I've seen a lot of people like from data science you talk about the content but starting with r and then switching to to python why do you think the the switch was was

7:28happening yeah so I think everybody's journey is uh slightly different for me it's more or less like actually personally I use Python um but then like my my research group they're using r u because they already know R that's why they're using R in our group um and then like we have like newer team members joining us um some are using python some

7:51are using R and therefore um like myself I became bilingual I learned from the uh from from the members and yeah I mean it it sort of became natural um to use both like yeah yeah I

8:07I I think also you mentioned research like University was is and I think Still Still is to some extent uh you know influenced by our language right and actually DB also was initially the story influenced by our people but with the

8:26ecosystem uh of python growing you know so fast I think it's uh do you see also people on the research side going also a bit more in python or do you see the this community uh the are Community strong there because yeah for me I haven't seen like it's been on a decline I feel but what's your your thoughts on that yeah

8:49yeah so yeah definitely um I think it's more or less like what libraries or um

8:56batteries I think sometimes we call it batteries what libraries comes with the language that you're working on let's say that you want to implement something but then you don't want to code it from scratch so what do you do you to find existing Frameworks that provide all of the functionality uh that will allow you to do so um like let's say that you want

9:14to Wrangle uh data frames and before let's say before panis was created uh there there was DP deer in R yeah um and before M plot lib or um there there were like you know like um I'm not really sure about the exact historical but yeah in R there's TG plot too uh that provides pretty amazing uh

9:37visualization um so I I I think essentially there are the functionalities that people are looking for uh that will allow them to just you know get the work done and if python has it if R has it yeah why not use them no that's that's pretty fair I think uh today there is a lot of battery that is already there in python but

9:59there is still I think some issues case where it's still heavily double up in R and so what about um so stream L uh let's dive a bit into that so when do you know when the initial release happened like how long old is is the framework basically right uh I believe it's somewhere in August 2018 um and yeah I mean there's Adrien

10:25Amanda Thiago the original founders um and I think Adrian gave a talk at one of the pi data um conference uh I think it's still on on YouTube um if you go to the Str YouTube channel we have it in the playlist um yeah so 2018 uh toward the end of the year wow yeah so that's uh that's

10:50actually pretty old in term of like uh I would say data landscape right because that means uh means six years so six years years yeah yeah because like some some stuff are like you know getting adoption in production in the data lands scale after one or two year so you would say that like stream is there for battle

11:11tested and have you know pretty mat your

11:15ecosystem yeah and and what what what is the um maybe we can um actually go for

11:23people I'm curious uh that doesn't know uh stream lead uh I guess there is a gallery right that right I can watch let

11:33me share that but do you have

11:37um I could share a slide dech if you want yeah I'm curious yeah you have a link but otherwise um I'm curious to see to have your uh to have your view also on yeah what which kind of like component component uh or you know beautiful dashboard you've seen uh built so far yeah uh yeah there's so many um

12:03and I've actually gave a talk on um like crafting a uh dashboard and one of the example was like using the US population data set um yeah so the talk is on the

12:16stre YouTube channel and like you know like the underlying code GitHub code repo uh demo app is already in there so if anybody wants to clone Deo and recreate it yeah um and and definitely these are like picked cated apps from the community and yeah all of them are are pretty amazing okay so here yeah here it's

12:39directly uh basically going to some stream lead app so that's uh actually a component that's be can be also reused I

12:50guess that's that's how it works can you can you work us like through the high level on the on the logic of stream lead yeah sure um uh let me see if if I could share the slide yeah maybe you can't actively uh and me do that okay um I think the best would be to uh

13:14to share the slide do you have a link to the slide oh yeah sure let me share the

13:22link you can share in this in the chat so that people can also also join and so

13:29yeah that's so that's like what why you're doing this so this was I'm not a stream lead expert but this was like when I Was preparing the and playing around with ddb I was looking to uh basically this Gallery which is pretty nice always if you have so I have I'm pretty surprised that you have uh so

13:47lm's like right in front of the of the tradings I was I was curious uh today did you share this uh slides yeah I sh Shar in the chat under the people

14:03tab yeah so so that's the slides uh that provides like the high level look at the street framework um can can you can you have access to that yeah wait yeah there you go you could

14:18do yeah give me a sec

14:26MH uh yeah so uh

14:33and just so yeah so when when I was uh building uh basically the uh the app let me bring this here it's it's just always nice when you have a clear documentation and uh I would say also just a a view on what's possible and and just have you know a plug in play so congrats on that on on on the

15:02team to uh to to to have this thing um

15:07yes so which which side in particular okay so this is the demo I guess you were uh you were mentioning right right exactly yeah yeah and so well so this is

15:19uh US population data so we have um

15:24some major kpi here is there like filter on the right uh there's filter on the left um in the sidebar um yeah but but it's hidden so if you expand it it's like at the top left um could go to um let me show you the link in the chat dashboard population Das dasboard do.a and yeah I mean user could play

15:49around with the years and then they'll they see uh the visualization update yeah I'll I'll bring uh to the chat so people won't to see uh yeah yeah so those who are like into building data apps um interactive uh data uh filter data

16:08visualization um yes Str provides that you know like that real-time feedback um you know like when like when I was learning to code for the first time you know like um defining variables being able to filter uh data sets and getting subset views of the data um and if you're able to get like an immediate feedback of what of the code that have

16:28written you know in visualization form that pretty much like helps reinforce you know provides motivation um and also like the momentum to to learn and uh proceed further yeah I think that's a that's a really good point is like when you're developing uh a web app versus data engineering or data science there is something a bit rewarding because you

16:50have a an UI result directly appearing right right versus if you build a data pipeline at the end you know it's just the data set but you can see basically the uh the end use case that's that's being used so yeah so this is you so here on the left we have the the filter for for the year and it's

17:10refreshing and I see that's that's what's pretty neat here is that you have also let me hide the the chat just to to show to the people um but we uh this is what I want to do

17:25there is actually a button here which is uh where you can download to CSV uh which is actually interesting and it's also a bit dangerous now if you when you expose what's your thought on this of this like specific feature if you expose to business yeah yeah so yeah so luckily this is a public database so um yeah

17:47yeah that that's okay and all of the attribution sources are already provided there yeah but for like proprietary data yeah definitely um uh it it would be

17:57nice to you know like maybe uh I mean there's a possibility of you know like um performing data scrambling under the hood um so like the actual data um is not in there but just like the aggregated data you know like let's say that if you your data is like 2 plus two um and then the summation is

18:19four but let's say that you you scrambled the data it became one plus three so that's not the true data right but then like the the outcome the sum is it's the same yeah so there there are ways to prevent that yeah cool we have already a a question with streaming can I create and and or type f uh type

18:41filters when when uh when I I meets one

18:46condition or another yes I guess yes um let me share you this link stateof lm.

18:50s. apppp um if you scroll down and then you you go to the gallery uh you you'll be able to filter um l m powered apps um

19:00using the an or or you know like uh there are many LM libr tools like for example open a coher um Lang chain llama

19:10llama index and then you could provide like some filters like okay which tools you want to see and or or and then you'll be able to see that um it's at the bottom um in the gallery I think it's called The Gallery yeah right there yeah yeah so you could yeah in the drop you could the the

19:29text stack that you want to see and yeah you you could s more than one on the left one yeah and then the resulting data frame at the bottom will be updated and then the screenshot of the app will also be updated in real time okay yeah

19:46yeah that's that this one this one is pretty big right there is I see quite quite a lot of uh of things well that's uh uh pretty

19:57nice so this is also can act as a learning yeah so it's not anymore I mean data apps at the end you want to get some insight and some information right but here it feels even more kind of a notebook style you know step by step thing that's kind of the case where you you build I see information the

20:20metodology yeah yeah definitely you know like um prior to streamlet you know like when it was starting out in using python um like jupit notebooks collab notebooks are pretty much like the go-to you know like you want to do some analysis you create a you create a notebook cab notebook however if you want to deploy that you

20:40know like providing um other uh stakeholders or peers or um other viewers to have access to your code um one issue is that they'll have to install the prerequisite libraries right um they'll have to know a little bit about the language either or python um there need dependencies there need the underlying data um but then like if you

21:04deploy it as a web app and all they need is a web browser and an internet connection and they'll be able to access the the app that you shared with them and so all of the underlying calculation it's already in the browser in the app and yeah what what the users see is what you see during development um either it

21:21be a data frame or a uh data visualization and and what you see here is essentially like a like a report but then made made into an interactive um data app where users could just interact you know like select filters drop downs and then uh they'll be able to see the report in a customized way and yeah this was

21:41actually built uh using all of the data um accumulated from April until the end of the year of 2023 when LM um essentially um skyrocketed to new heights wheni chat gbt was released and then like yeah yeah all of the Innovations all of the breakthroughs are happening you know on essentially like on a day-by-day basis and yeah so we

22:04brought the need to release this report and yeah no that's pretty that's pretty neat um and something else I wanted to talk related to that we're gonna talk about the new case by the way with uh llm and streamlit um but what is I mean

22:21I think you already said something about it but what are you the biggest use sces you've seen around stream lit in term of like you know workflow or business case because we have data scientist building apps we have like for prototyping uh we have I think llms also uh stream is using for prototyping but then there is also uh stream some

22:44streamly app that's run in production where where do you see the most AV use case on on those things yeah um so I think it's essentially this that what whatever you

22:59have in mind you know like your idea your your uh imagination um either it be to take a data set to provide visualization to allow the user to just view it to interact with it engage it um play around with the data um get to know the data Maybe provide feedback right inside the data um in the set of LM app you'll

23:23be able to provide feedback um there is a true bricks um component on the sidebar thumbs up thumbs down yeah users could provide feedback to the developer you know like um which feature do they like they don't like um so I essentially think that the apps that you could build with shml um you know is essentially up

23:43into your up to for for your until your imagination you know like um whether it be to build a data app or you know even a resume um actually uh when I when I joined streamlet I built a resume um if you go to chennin that resume I think it's currently down because um it probably um is is heavily um viewed and

24:06it pretty much occupied the resources yeah so I I app and um to Showcase my

24:12resume um instead of a static version but into a interactive uh resume um yeah

24:19so definitely you could build a portfolio using SC lit you could showcase all you provide links you know like um um I think I I I have repo on my GitHub where you could create a similar to a a link bio um that provides like a summary of all of the links um that you have that you want to share um I think

24:40it's very helpful for Content creators where uh you want to share your Media Kit you know like all of your social media links is already in there um you can provide links to your uh to your shop to your um maybe affiliate store um all of the links yeah you could essentially build a stre app to post all

24:58of that and yeah maybe more yeah wow no

25:02that's uh I think that's that's the power I I I think of stream it is it uh flexibility let's let's let's be uh the

25:12playing The Devil's Advocate where where do you see uh streamly hitting some some

25:19limits uh yeah I mean stream is an evolving framework uh web framework um

25:26so I think like for for now is like is essentially like implementing new features um features that the community would love to have and um if it doesn't currently have it I mean users could you know post it either in the Forum or in the GitHub as the GitHub issues on the stret um uh official repo and So

25:49currently I think like like me personally you know like when I when I think of some features that would be great to have um yeah I remember Adrian um one of the co-founder mentioned when I was like on boarding to streamlet he mentioned like that um the company is evolving on a continual basis and he said to to play a

26:12game like Invision some feature that you would love to have and then keep that in mind and then three to six months later you know magically it is there and so yeah I did that a couple of times and it's like wow okay this feature was in mine um and I I sort of forgot about it and then you

26:32know like a couple of months later is in the one of the new releases as as a new feature um so I don't know I don't know what what the limitations right now but I mean at the end of the day I mean it helps get the job done you know either to build an interactive data uh data app um I think

26:52that's all we need yeah no but that's the that's interesting to know that the team velocity is uh um pretty good right so

27:06to to have basically the the opportunity to um to give feedback on the repository and have those features implemented I guess you have a voting system from uh from the community or whatsoever to to priortize those things maybe yeah yeah I mean if if you want to see like what upcoming features we have for stret you could go to Road

27:25map.app um let me share the link l in the chat as well yeah for there you're going to see like okay in q1 Q2 Q3 um which feature would be released yeah uh yeah I'll see I'll just

27:41uh copy directly into the the chat nice cool um so we are uh already

27:52almost at alphabet uh time is flying by and so let's let's get a bit uh pragmatic and go to the coding part so

28:03um what what do you know actually about the DB yeah so um yeah I think it's a an interface data for for you know interfacing with data um as a database yeah so the so it's a in memory

28:21you know in process database so meaning there is no specific other server that you run it within your python process right and um and often what I get from feedback from um python people is yeah but uh why would I need uh a database within you know my python my python arment right um and that's that's totally fair and I think um that's the

28:48power of like the misconception is that because it's a in process there is way less management and you can use it for processing uh storing data so the thatb has also his um his own file format so we we'll go into that and actually on the I've seen that on the snowflake YouTube channel there is a video about uh stream lead

29:12and uh dgdb so you can uh go also watch this out so the repository we're GNA go is uh this one if you want to uh join

29:24and so let me just bring this um so I I

29:28just clone it uh it's a collection basically of some uh of some app and

29:35maybe where I wanted to start is uh just to show you uh what kind of data I have uh so and I'm curious to hear so here I have actually um two different format I have this daily stats. CSV um and I have a WB

29:56stats. uh DB which is a dug DB database it's actually the same uh data uh one is in dug DB database and the other is in CSV so we are going to see is that um it's way uh smaller uh within ddb

30:13database and what's nice with a dgdb database file format is that it can include multiple tables um so if you have multiple CSV uh and you want to package this into a simple app so this is where the DB and stream Le goes well together um because uh usually you have a service uh database or cloud data

30:36warehouse that you connect your streamlit app right and basically fetch the fetch the data and here uh what you could do for small data set is basically just push directly your data set in um

30:54in uh in the Stream lad app without any connection or need for any dependency in the cloud did you see already that pattern with like flat file whatsoever with uh with some stream late app uh yeah like SQL light um is it is

31:13that what you mean yeah yeah no or like how people are pushing streamlit application to the cloud like they do they emed it as sqlite or the Ed it a CSV file without um directly yeah yeah

31:29are yeah I mean I mean I think it's it really depends on the use case and yeah I definitely see a lot of uh various ways either it be connecting from a third party database as you mentioned or also from a SQL light as well yeah so

31:45basically theb is the SQL light for analytics so if you do typically you know in dashboard you're G to do group by count and so on so this is where basically uh theb is going to shine not perform which has different different goal so um you have the D DB basically

32:06um C uh CLI and uh within the ddb CLI

32:11what you can do is uh read actually

32:16directly uh from the CSV if you want to inspect uh what's uh what's in it I can also put just a describe table um

32:28to have the the schema but basically uh what is this is PPI data so PPI statistics so ppi is used the the repository where python Library lives right and here it's filter on one project which is uh ddb and for each line is basically a download of the duck DB uh python library and So within that

32:53uh we have multiple information like uh python version uh the system name the CPU and you see that we regroup actually uh it's not one line Sorry by download we regroup it by day already so this data set is already um something grouped by day um so yeah like um the thing is

33:16uh how do you actually just for information people in the chat if you want actually to um uh create a small dug DB database it's pretty simple you just pass when you launch the CLI uh the name so like stream lead uh demo and you

33:34can give it any extension uh dck or dck DB I use usually just DB as a standard yes of course someone uh that

33:46said that I was oopsy wait to remove the chat for

33:55people uh um so when you when you launch te DB if you do this basically what happened is that uh you have uh you're going to create the database um so and now you can create a table uh

34:12demo as and then P you select uh select

34:17statement so here we have a daily stat.

34:21CSV and that's it now the the table is created it so you have uh basically uh the demo tab which is uh was loaded from

34:34uh the CSV all right and so if I exit the process now because the process is in memory I'm going to have uh the data here uh you see which is uh which is

34:47persisted um so that's how you can easily create uh and take leverage of the of the deck DB file format and again um the cool thing is that everything is included and if you have uh shanine you mentioned depend on the use case I think if you have like large data you probably don't want to push it directly um but if

35:10you have small data set I think it's uh it's uh it's pretty relevant I would say like below 20 uh 20 megabytes because

35:20you don't want to load too much um but uh yeah this one already has a most uh one million rows so so yeah do do you know how how much memory um is duct DB consuming versus a typical pandas data frame yeah that's a good question so it's so D is writing in C++ and have multiple Optimizer to unal memory uh so

35:48usually on average when you do typical comput uh in pund uh ddb will take less

35:56memory and will be faster and so the pattern we've seen so thanks to Apache Arrow you can query a data frame object uh directly um uh from uh from the BB uh

36:09so you could basically play with bandas and what I've seen people doing is basically use Pond for you know the visualization part because we know that how many like how many components visually components rely on fondas a lot right but you can basically load your data the P the data frame do to compute using ddb and then um you

36:32still have at the end upon the data frame that you can use oh interesting for for does that make any sense to you yeah yeah yeah that's pretty cool which component is that um no so we can we can go to uh a bit the code directly now so um how does

36:49that work so basically I buil a small dashboard based on the data set we've uh just seen right um and so how I'm loading this I'm loading just I'm creating a connection uh with the database name and the database is basically uh here is this one this one is the one I I just created um is it big is it big enough

37:11for you because it's a it's big already for me but I think it's um okay cool uh and

37:19basically uh I do some queries uh you see uh to filter some data and then uh

37:27basically you see here this is my main query and I return it as a panda data frame right so and after that I can do anything I want within a streamly chart uh so for example here I have a line chart and I just pass uh the the the data frame so yeah bottom line is that as a middle computes uh dgb is really uh

37:53really faster than uh than than pandas and it's really interchangeable within the the the data frame and uh you can also when you have for example um Let me show here for example this one you could also do um so if you

38:15have your uh dgdb connection which is uh here con you can say um you can also directly query uh not that if I

38:31do this for example that would work to so this is uh a panda data frame and basically dgdb can Cory directly the uh the pandas uh object so really easy to the bottom line to move uh to move around um so yeah here uh it's a bit uh

38:53I would say uh Scrappy in term of organization I didn't make a lot of efforts it's a one single app uh uh script uh but basically uh there is a line graph here uh that takes the number of downloads over time um there is another line uh graph for part and version usage and then there is the CPU

39:17usage uh so basically if you're using an arm or Intel so uh CPU which is

39:24interesting to know when you have a python library because you need different build right uh to work with the different architecture so it's also interesting to um to to see yeah what's

39:36your most the your I would say your goto

39:41visualization Library when you build stuff with stream lades yeah um yeah I mean if it's like a simple chart I mean the the built-in uh line chart or bar chart that Str provides is a good start um however if you're visualizations are becoming more you know more complex you want to add more interactivity um I mean out is pretty

40:05cool for me as well look which one Al a

40:09l t a i r all right can I see it on the gallery

40:15I guess yes uh it's actually used in the state of L app um all of the data visualizations okay um wa if I go oh

40:25it's not currently there ah okay so it's sorry a t a i let

40:35me see if I could share a link um yeah be curious so do you say they mentioned that it's being used uh in this app the state of LM stream late yes uh it's used there and yeah actually I've coded theate state of LM apps um it took about three months in the making all right let me uh let me quickly share

41:00uh this people so that's uh uh basically interesting is that you have

41:09um uh building uh database library but then there is multiple one that you can uh you can check

41:22right uh all right so once I have uh

41:26basically my loading data doing some transformation pandas and then passing and using the buildin um I just need to do a pro uh here I can

41:38do directly a stream lead run and uh that would launch the web server right and then basically I can uh I can open it and so that's it so that was fast um

41:55so you see uh we load actually about 1

41:59million uh of rows here and you see the total download the data is not up to date because I think it goes back to uh

42:09yeah November 2023 um but that's the monthly downloads over time and that's the total download and you see the python uh version usage

42:22and here uh the CPU usage uh so you we have still x86 64 architecture but uh

42:32IMD and Arch is yeah rmo is getting some

42:36some tructure but that's still a huge percentage actually of the ecosystem but that may change with like apple silicone which is a different architecture um yeah so that's does the

42:52filter uh I'm curious so here the filter is on the left on the left panel can you put it anywhere you want actually or what's the the best practice ex easly what do you do yeah um yeah so there are many ways what what you've done here is put it in the sidebar uh which is like typically st. sidebar and dots and then

43:14the name of any widget that you want to put want to place in sidebar um for for me personally I like yeah yeah exactly I see that sidebar uh for me personally I like to place the the filter widgets for the specific data frame near the actual data frame um so that they'll be able to see that okay these widgets will influence

43:36this data frame or this data visualization which is right underneath it um so um however in the sidebar what what I like is like um maybe widgets that will influence like the global um changes to the app like okay you want to display the year which applies to the entirety of the app um yeah but more fin

43:57granular detail I think that the the main panel would be cool yeah so um that's uh that's that's

44:06a really good advice that basically get keep the S uh panel uh that we have um

44:15here uh for things that's impacting

44:19everything and then probably other filters next uh to the relevant graph so that it's it's more understanding and I think for dates usually I think it's pretty fair to have it as a global because it can get really confusing if you have a specific date for one graph and another right because then sometimes you can have a uh funky uh funky

44:44interpretation I would I would say someone asked sorry yeah what you have here is already perfect cool thanks I'm appreciate like for you because as like still extremely NB someone asked in the chat get up repo on LinkedIn so let me just um uh put

45:07that here um if uh LinkedIn wants to be LinkedIn

45:19but it seems like me reload so the other

45:23so this is one example of so so using ddb and so you could push basically the app I guess you have so the part of the free tier the community free tier I can push this up directly on the stream lead Cloud yeah yeah so Community Cloud uh we only have one tier which is the community Cloud um so users could deploy

45:47their apps for free um if they're public apps uh unlimited public apps but if if uh they want to um provide like create private apps um they'll they'll be allocated one private app per account yeah cool no so um yeah so this

46:06is pretty straightforward you create basically you have a ga preo and then you connect it and you push using the stream lead CLI uh so essentially you could do a v either push it from your CLI um or you could also um do it visually by just going to your GitHub repo um if you edit the repo it updates

46:28automatically in the app yeah so that's great yeah now that's that's uh that's pretty Andy um so another thing I wanted to uh to show now is an example with llm because I know uh some llm companies have been using streamlit a lot to uh kind of like um boot uh boot prototype

46:53uh and trying a model right because and I think if you go to the stream L Gallery so we said like llms you have uh

47:05where else was I think it was the chat was not here actually uh I've seen

47:15some other with uh basically um just a shat box that you can use that's typically what you want right uh to be able to interact with an llm so you have a chatbot component which is right um yeah available um yeah actually we have a we have a documentation page about how to build chatbots um and that provides

47:39you know like cod Snippets and also like uh like a very simple Cod snippet that you could you know build a simple chatbot with using like open AI um and we've also have like a tutorial on the YouTube channel on the stent YouTube Channel showing how to use open source model like llama 2 uh to to build a chat

47:56MoDOT and yeah there's like this blog from Adrien about how scmet and generative AI are a perfect match um I've shared the link to to you medy ah yeah yeah yeah I think normally you can access the global chat and post it directly but let me just uh uh let me let me uh copy uh copy put it directly here oh

48:19okay cool oh I'm connecting my other account um let me do that yeah um

48:26yeah so that's that's interesting and so this is what I wanted to uh to close and and do for fun so mod deck uh released

48:35a uh basically uh sorry a text to SQL

48:41model um so if you go actually to uh to

48:46agging face uh you have uh basically this uh available text to SQL model and

48:55what what it does is basically you can ask which kind of query you want to write in natural language and it giv you the dctb query so it's a train against uh dgdb documentation for dgdb syntax specifically um you can see also that the app on aing face is using stream lit

49:14so if you go to uh files that's on aing face and you go to app we see that we are using uh for this specific demo um and for the demo we are actually using um an endpoint where the model is uh ex HED but what I wanted to

49:33do here because ddb is great for prototyping uh locally the data locally and I don't know have you have you familar with AMA I've heard of it for using running local ones I mean LM models exactly so

49:51uh it's basically kind of like a Docker for LMS where locally so it's like package your model and you can run it locally um and so you have a bunch of model available and you just do you know the command run and what it's going to do it's going to download uh the artifact and then you can uh you can run

50:11it and actually if you can see you have uh the ddb uh no text to SQL model which is U available there um so that what is

50:22so that's pretty fun because uh then that means that if we have

50:29um U so I can run for example so this is running the model locally so now I can ask basically any question uh how to read uh TV called demo CSV let's say um

50:46so it gave me the syntax uh uh I just did the typo on while clicking enter so that's why there is this um but that's running locally and what I can do is also have a server um that's run uh

51:02locally and it's already in use so that should be uh should be okay I'm going to kit that just to show you the server all right so now I have Ama server and it's going to listen to any request and see which kind of model I want to use that I have already you know downloaded locally does that make sense

51:23so far sounds good all right

51:28so um released uh a small uh a python

51:34ISD Cas that's why I found like that's that's perfect they release it like a week ago because before you had to do a couple of things uh but you see here basically this is how I'm cing my local server so this is the trick uh because I'm running within a Docker container here my uh vs code which editor are you

51:54using by the way what's your PR go to editor uh yeah vs code and um also on the cloud uh GitHub spaces as well cool

52:04yeah I so you're using a bit the dev container features I guess you're familiar with this so the dev container features basically enable abl you to run your vs code uh within Docker environment and G up code space on G up is using that and so the trick here basically is just um to say that the host is this because I have on this uh

52:28port on my local machine I Have This Server Allama running and so this is why I'm specific this if you're not using Dev container you can just skip because you will just use uh the default host uh client and so here again I'm just doing the same thing and I'm printing the result um so let me just

52:52do uh The L Word and so now we're GNA have a

53:00query and that's the same thing so if I go to my logs here you see that I receive a request and basically is using the model that is specify and giving back the the query results um so yeah so that's pretty cool everything is still local so I'm not going to disconnect my internet to show you because otherwise I'll be

53:23disconnected from the Stream uh basically the the model is running locally uh and I'm just sing through it so now um basically coming back to stream late and ddb uh what I did is again uh just read the the same data set

53:42same uh uh DB table what I'm doing here

53:47is uh so you saw earlier I use the describe command to specify the schema right and here uh I'm doing that and I'm getting the schema result uh with to basically inject into the prompt so whenever I ask a question he has the the schema metadata of my data sets right um

54:09so this is done automatically uh basically here I'm specifying uh the table that I want to load uh in the in the query and after that's basically it I'm just uh again connecting to all so that's going back to the other word and I'm passing the full query which is uh the schema description uh so that I'm

54:32injecting and plus the the user

54:36query so now the real uh moment of true

54:42um so if I run uh the GPT duck

54:48app and uh and open this

54:54one so now I just have basically a natural language and so I don't remember actually which uh so let's say give me uh the

55:09total downloads of packages I'm not sure it's C it's interesting to know how he interpreted that giving the schema and you see actually is it's displaying the um the query which here uh a pretty straightforward query is just to select sum on this daily account and I get the 25 million which was the the one if you remember on the first uh

55:35on the first one um so that's a pretty neat interactivity on stream lead app with just a button getting the query to uh the local model uh local data within dgdb um and so I can also try to have the same if you remember the dashboard we had also the python so give me the um sum of uh

56:03[Music] per uh python uh I would say it's it's

56:09interesting how it needs to I want to write a SQL actually so sometimes it's interesting it's actually easier but give me the samper python version let's let's see how he going to interpret that um so I think it's right but it didn't uh display me the yeah so he it did by Pon version um is

56:33simp python version uh with the python version

56:40yeah okay so this is this is it um so yeah you see that was that's really fast right because what's happening it's below like it's within the seconds right and what's happening behind the scene is that it's contacting the local text to SQL model interpreting that text tosql query generating that query and then cing the the local uh data set what

57:07do you think that looks amazing and it provides the the relevance equal query as well yeah so um so yeah hopefully we are

57:17um some some uh some closing thoughts but basically that's um that's kind of like hopefully spiring you uh the chat and the audience to play around uh you know local dgdb and understanding how the ddb file format works and it's power to process the data and um and it's F

57:39easily inter chargeable to Ponda data frame and um and then you can go further with like text to nosql uh text tosql model um you have we have also an endpoint available but uh you can uh you can just download model using AMA did that give you some curiosity to try AMA and play with with some stuff on your

58:02side yes definitely maybe you could ask you could actually actually what you could do is with the state of LM uh you could push uh this basically trying to have some uh text tosql queries so people can ask question around the states of the data um yeah yeah like a great idea yeah what's uh what's next uh for

58:30you and for stream lead so so first for you what what are you busy right now and looking forward to to release and at stream lead level can you give us a bit some thoughts about it yeah uh yeah so

58:45at at the micro level um what I'm currently working on are like some internal apps um so we're we're trying to build like internal apps using sulet um that are used like for for events um for uh like organized by Snowflake and another thing that I'm working on is more or less like creating educational content whether it be like how to get

59:07started with shamlet how to use shamlet for various use cases like for large language models or for uh data science um and yeah I mean teaching is one of my passion um and I think it's like it's like one of the transferable skill that I get from I like to teach and so yeah let's teach and make contents on on

59:30YouTube and yeah I mean maybe write a

59:34blog about this as well um repurposing some of the content um and maybe draw some infographics um to consolidate you know like and simplify some uh hard or

59:49difficult Concepts and yeah thanks for sharing the link um and yeah so yeah enough to learn I think I I think I think we've been like like I said like when I started my journey on YouTube which is uh uh was after way after you uh you were definitely inspiring and uh I learned a few things so uh it's really great to see you uh

60:15you know continuing in that direction uh I personally love it and I think a lot of people love it I think you had some love in the chat from some people recognizing you from the data Professor so please go check out uh his YouTube channel but also everything you should do at streamlit I'm uh happy to reshare

60:35any stream lit app that you did especially if it's using text so uh text to SQL thing because I think it's a pretty hot topic so that's fun to play with yeah yeah definitely um and I I would like to maybe Explore More into the duck gbt that you've created and and yeah I mean maybe we could collab

60:56somehow and yeah maybe write some blogs together or you know do some collab sure would love to um for the audience for the chat uh thanks for joining and we do this uh live stream every uh uh other week so not next week but the week after and uh on the next episode we're gonna talk uh tgb and

61:19DPT uh so I'm sure that in the audience there is a lot of people using uh DBT so go check it out subscribe to the YouTube channel if you don't want to miss uh the event that's the best place or the uh LinkedIn page and I'll see you there and thank you Channon for joining yeah thank you for having

61:41me

FAQS

How do DuckDB and Streamlit work together for building data apps?

DuckDB is an in-process analytical database that handles data loading and transformation, while Streamlit provides the web application framework for interactive visualization. You create a DuckDB connection, run SQL queries that return Pandas DataFrames, and pass those DataFrames directly to Streamlit chart components. Because DuckDB runs in the same Python process, there is no need for a separate database server. This makes the setup ideal for lightweight, self-contained data apps.

Can you use the DuckDB database file format to bundle data with a Streamlit app?

Yes. DuckDB's file format lets you package multiple tables into a single compact database file that you deploy alongside your Streamlit app. This is significantly smaller than equivalent CSV files and eliminates the need for a cloud database connection for small to medium datasets (roughly under 20 MB). The app reads directly from the DuckDB file, making it fully self-contained and deployable to Streamlit Community Cloud for free.

How can you build a natural language to SQL chatbot using DuckDB and Streamlit?

Combine a text-to-SQL model (like MotherDuck's DuckDB text-to-SQL model running locally via Ollama) with DuckDB and Streamlit. The app injects the database schema metadata into the prompt, the user types a natural language question, the model generates DuckDB-compatible SQL, and DuckDB executes the query locally. Results display instantly in the Streamlit interface, all without any cloud dependency or internet connection. To learn more, see the DuckDB + Python quickstart.