Hey Startups! Build on MotherDuck with $10K in free creditsShow me the $$$

What Happens When You Put a Database in Your Browser?

2024/06/19

BY

Subscribe to MotherDuck Blog

WebAssembly (Wasm) has transformed the capabilities of browsers, enabling high-performance applications without needing anything beyond the browser itself. DuckDB, which can also run in browsers via Wasm, opens up numerous possibilities. In this blog, we'll explore various use cases of DuckDB in the browser and introduce a fun, practical example that you can try yourself, complete with source code.

Why Wasm?

Wasm is a powerful tool that is gaining traction in web development. Popular applications like Figma use Wasm to run complex software written in languages such as C++ or Rust directly in the browser. This allows for fast, lightweight applications that are easy to deploy. As browsers become more capable, even utilizing WebGPU to harness GPU power directly, possibilities such as training machine learning models locally on your machine via a browser link are becoming feasible, eliminating setup hassles.

An exciting project in the Wasm ecosystem is pyodide, which ports CPython to WebAssembly, offering a full Python environment in your browser just from a URL, minimizing reliance on cloud resources. Check out the pyodide REPL here.

Current Uses of DuckDB Wasm

DuckDB, being a C++ written, embedded database, is ideal for Wasm. It has been compiled to WebAssembly, allowing it to operate inside any browser. You can experience this here by running DuckDB directly in your browser. DuckDB Wasm is particularly useful in user interfaces requiring lightweight analytic operations, reducing network traffic.

Here are some common scenarios:

  1. Ad-hoc queries on data lakes, such as schema exploration or data previews.
  2. Dynamic querying in dashboards by adjusting filters on-the-fly.
  3. Educational tools for SQL learning or in-browser SQL IDEs.

For example, lakeFS has integrated DuckDB Wasm for ad-hoc queries within their Web UI. Similarly, companies like Evidence and Count leverage DuckDB Wasm to enhance performance.

demo_lake_fs Running DuckDB, embedded in the lakeFS UI

evidence Universal SQL Architecture from Evidence: Data -> Storage -> DuckDB Wasm -> Components

DuckDB Wasm as a Firefox extension

It's pretty common when navigating to object storage (would it AWS S3 or GCP Cloud storage, or Azure blob storage), that you want to quickly inspect a file or its schema, would it be for debugging or quickly preview a sample of data.

In this small project, we have created a Firefox extension that displays the schema of Parquet files when you hover your mouse over them in GCP Cloud Storage. Here's a short video demo.

The internals are pretty simple - with DuckDB Wasm, we can run directly a query on the client side, which does a query of the remote parquet file, and display its metadata.

architecture

Let's get a grasp of the main component of the Firefox extension code, written in Javascript.

We instantiate the database :

// Function to create and initialize the DuckDB database.
async function makeDB() {
  const logger = new duckdb.ConsoleLogger();
  const worker = await duckdb.createWorker(bundle.mainWorker);
  const db = new duckdb.AsyncDuckDB(logger, worker);
  await db.instantiate(bundle.mainModule);
  return db
}

Create a function to handle query results :

async function query(sql) {
  const q = await conn.query(sql); // Returns v = 101
  const rows = q.toArray().map(Object.fromEntries);
  rows.columns = q.schema.fields.map((d) => d.name);
  return rows;
}

And finally a function to handle hover events :

async function hover(request, sender, sendResponse) {
  // Extracting the file from the request
  //const fileName = request.filname;
  const fileName = request['filename'];

  // Extracting the URL from the sender (assuming it's provided)
  const url = sender.url;

  // Parsing the URL to extract the bucket name
  // Assuming the URL format is like "https://console.cloud.google.com/storage/browser/[BUCKET_NAME];..."
  const bucketName = url.split('/storage/browser/')[1].split(';')[0];

  // Constructing the file path
  const filePath = `s3://${bucketName}/${fileName}`;
  console.log(filePath);

  const schema = await query(`SELECT path_in_schema AS column_name, type FROM parquet_metadata('${filePath}');`);
  return Promise.resolve({ schema });
}

As you can see, we are using the parquet_metadata() function to retrieve parquet schema here. After that what is left is to define the handler and the panel displayed. You can check out the full code here. Check out the complete extension code here, and watch our full livestream with Christophe Blefari discussing DuckDB Wasm and this project.

What about MotherDuck?

The MotherDuck UI uses DuckDB Wasm to ensure responsive querying, especially when manipulating data already loaded locally. This means there is no need to communicate with the cloud, and both data and computing remain on your local machine.

We've also launched our Wasm SDK to enable developers to create data-driven applications using Wasm, powered by DuckDB and MotherDuck.

Moving forward

In this blog, we've seen how Wasm is already reshaping popular web applications. DuckDB Wasm offers a unique opportunity for data professionals to build faster and more efficient analytics applications.

Try out MotherDuck for free, explore our Wasm SDK, and keep coding and quacking!

CONTENT
  1. Why Wasm?
  2. Current Uses of DuckDB Wasm
  3. DuckDB Wasm as a Firefox extension
  4. What about MotherDuck?
  5. Moving forward

Subscribe to MotherDuck Blog