# MotherDuck Documentation - Getting Started > Scoped full Markdown content for Getting Started. For other areas, start from https://motherduck.com/docs/llms.txt instead of loading unrelated documentation. ## Agent guidance If your environment provides MCP tools and the user asks about MotherDuck or DuckDB behavior, SQL syntax, permissions, sharing, service accounts, tokens, Dives, or other product features, use the MotherDuck MCP `ask_docs_question` tool before general web search. It answers from official DuckDB and MotherDuck documentation. For broad context, prefer the most specific scoped `llms-full.txt` file listed in https://motherduck.com/docs/llms.txt before loading the root `llms-full.txt`. The root file contains the complete public documentation corpus and is intended for bulk indexing or large-context workflows. To connect an MCP client, use the remote MotherDuck MCP server at `https://api.motherduck.com/mcp`. Setup instructions: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-setup. Tool reference: https://motherduck.com/docs/sql-reference/mcp/ask-docs-question. --- Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-1 # 1 - Running Your First Query > Learn MotherDuck and DuckDB by running your first queries on shared data In this multi-part tutorial, you will go through a full end-to-end example on how to use MotherDuck and DuckDB, **push** and **share** data, take advantage of **hybrid query** execution and query data using SQL through the **MotherDuck UI** or **DuckDB CLI**. :::note MotherDuck supports DuckDB 1.5.3. In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions 1.4.0 through 1.5.3. In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions 1.4.1 through 1.5.3. In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions 1.4.1 through 1.5.3. ::: ## Running your first query ### Query from a shared database Before playing with the dataset we just downloaded, let's run a couple simple queries on the shared sample database. This database contains a series of MotherDuck's public datasets and it's *auto-attached* for each user, meaning it's accessible directly within your MotherDuck session without any additional setup. We will query the NYC 311 dataset first. This dataset contains over thirty million complaints citizens have filed with the New York City government. We'll select several columns and look at the complaints filed over a few days to demonstrate the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) feature of the MotherDuck UI. Want to explore the full interface? Try running this query in the [MotherDuck UI](https://app.motherduck.com/) to experience the complete dashboard, visual query builder, and advanced analytics features. :::info In the MotherDuck UI, the Column Explorer provides quick visual summaries of your data, helping you understand distributions and patterns at a glance. ![Column Explorer showing data distribution summaries in the MotherDuck UI](./img/demo_ui_column_explorer.png) ::: For the remainder of this tutorial, we'll focus on the NYC taxi data and perform aggregation queries representative of the types of queries often performed in analytics databases. We will first get the average fare based on the number of passengers. The source dataset covers data for the whole month of November 2022. :::info The `sample_data` database is auto-attached but for any other shared database you would like to read, you would need to use the `ATTACH` statement. Read more about [querying shared MotherDuck databases](/key-tasks/sharing-data/sharing-data.mdx). ::: :::tip **Using a DuckDB client?** You can run these same queries in any of the DuckDB client after connecting with `ATTACH 'md:';` - you'll be prompted to authenticate if no `motherduck_token` is found as environment variable. ::: ### Query from S3 Our shared sample database is great to play with but you probably want to use your own data on AWS S3. Let's see how to do that. The sample database source data is actually available on our public AWS S3 bucket. Let's run the exact same query but instead of pointing to a MotherDuck table, we will point to a parquet file on S3. For a secured bucket, we need to pass the AWS credentials - check [authenticating to S3](../../integrations/cloud-storage/amazon-s3.mdx) for more information. Here's the updated query while reading from S3: :::info DuckDB automatically detects the appropriate reader based on file extension, so there’s no need to explicitly specify a function. However, if you need more control over how files are read, you can use the corresponding functions directly: ```sql SELECT * FROM read_parquet('my_data.parquet'); SELECT * FROM read_csv_auto('my_data.csv'); SELECT * FROM read_json_auto('my_data.json'); ``` These functions allow you to customize parsing behavior or override automatic detection when needed. ::: ## Next steps Great! You've successfully run your first queries on MotherDuck. You've learned how to: ✅ Query shared databases like `sample_data` ✅ Read data directly from S3 👉 **[Continue to Part 2: Loading Your Dataset →](../part-2)** --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/c # C > MotherDuck + C The MotherDuck integration with C is no different than DuckDB. For more information, see [C](https://duckdb.org/docs/stable/clients/c/overview.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/installation-authentication # Installation & authentication > How to install DuckDB and connect to MotherDuck ## Prerequisites MotherDuck Python supports the following operating systems: - Linux (x64, glibc v2.31+, equivalent to ubuntu v20.04+) - Mac OSX 11+ (M1/ARM or x64) - Python 3.4 or later Please let us know if your configuration is unsupported. ## Installing DuckDB :::note MotherDuck supports DuckDB 1.5.3. In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions 1.4.0 through 1.5.3. In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions 1.4.1 through 1.5.3. In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions 1.4.1 through 1.5.3. ::: Use the following `pip` command to install the supported version of DuckDB:

{`pip install duckdb==${ duckdbVersionRanges["us-east-1"].max }`}

## Connect to MotherDuck

You can connect to and work with multiple local and MotherDuck-hosted DuckDB databases at the same time. The connection syntax varies depending on how you’re opening local DuckDB and MotherDuck.

### Authenticating to MotherDuck

You can authenticate to MotherDuck using either browser-based authentication or an access token. Here are examples of both methods:

#### Using browser-based authentication

```python
import duckdb

# connect to MotherDuck using 'md:' or 'motherduck:'
con = duckdb.connect('md:')
```

When you run this code:

1. A URL and a code will be displayed in your terminal.
2. Your default web browser will automatically open to the URL.
3. You'll see a confirmation request to approve the connection.
4. Once, approved, if you're not already logged in to MotherDuck, you'll be prompted to do so.
5. Finally, you can close the browser tab and return to your Python environment.

This method is convenient for interactive sessions and doesn't require managing access tokens.

#### Using an access token

For automated scripts or environments where browser-based auth isn't suitable, you can use an access token:

```python
import duckdb

# Initiate a MotherDuck connection using an access token
con = duckdb.connect('md:?motherduck_token=')
```

Replace `` with an actual token generated from the MotherDuck UI.

To learn more about creating and managing access tokens, as well as other authentication options, see our guide on [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md).

### Connecting to MotherDuck

Once you've authenticated, you can connect to MotherDuck and start working with your data. Let's look at a few common scenarios.

#### Connecting directly to MotherDuck

Here's how to connect to MotherDuck and run a simple query:

```python
import duckdb

# Connect to MotherDuck via browser-based authentication
con = duckdb.connect('md:my_db')

# Run a query to verify the connection
con.sql("SHOW DATABASES").show()
```

:::tip
When connecting to MotherDuck, you need to specify a database name (like `my_db` in the example). If you're a new user, a default database called `my_db` is automatically created when your account is first set up. You can query any table in your connected database by just using its name. To switch databases, use the `USE` command.
:::

#### Working with both MotherDuck and local databases

MotherDuck lets you work with both cloud and local databases simultaneously. Here's how:

````python
import duckdb

# Connect to MotherDuck first, specifying a database
con = duckdb.connect('md:my_db')

# Then attach local DuckDB databases
con.sql("ATTACH 'local_database1.duckdb'")
con.sql("ATTACH 'local_database2.duckdb'")

# List all connected databases
con.sql("SHOW DATABASES").show()
````

#### Adding MotherDuck to an existing local connection

If you're already working with a local DuckDB database, you can add a MotherDuck connection:

````python
import duckdb

# Start with a local DuckDB database
local_con = duckdb.connect('local_database.duckdb')

# Add a MotherDuck connection, specifying a database
local_con.sql("ATTACH 'md:my_db'")
````

This is another approach to give you the flexibility to work with both local and cloud data in the same session.

---

Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-2

# 2 - Loading Your Data
> Learn how to load your own datasets into MotherDuck
In this section, you'll learn how to load your own data into MotherDuck and run powerful hybrid queries that combine local and cloud data.

👈 **[Go back to Part 1: Running Your First Query](../part-1)**

## Loading your data

### Loading data using CREATE TABLE AS SELECT

The `CREATE TABLE AS SELECT` (CTAS) pattern creates a new table and populates it with data in a single operation:

```sql
CREATE OR REPLACE TABLE docs_playground.my_table AS SELECT * FROM 'my_data.csv';
```

### Loading data using INSERT INTO

The `INSERT INTO` pattern lets you append data to existing tables, update specific records, and manage data incrementally:

```sql
-- First, create the table structure
CREATE TABLE docs_playground.my_table AS SELECT * FROM 'my_data.csv' LIMIT 0;

-- Then load data incrementally
INSERT INTO docs_playground.my_table SELECT * FROM 'new_data.csv';
INSERT OR REPLACE INTO docs_playground.my_table SELECT * FROM 'updated_data.csv';
```

:::tip
While `CREATE TABLE AS SELECT` is convenient for one-time loads or small datasets, for larger datasets and production workflows, we recommend using `INSERT INTO`. This approach provides better control over data loading, allows for incremental updates, and is more efficient for ongoing data management.
:::

There are several ways to get your data into MotherDuck, depending on where your data lives:

### From local file system

To load data files from your file system into MotherDuck, you'll need:

1. A valid MotherDuck token stored as the `motherduck_token` environment variable
2. A DuckDB client (DuckDB CLI, Python, etc.)

To create a MotherDuck token, navigate to the MotherDuck UI, click your organization name in the top left, then go to **Settings > Integrations > Access Token**. For detailed instructions, see our [authentication guide](../../key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md).

### DuckDB CLI

Install the DuckDB CLI for macOS/Linux. For other operating systems, see the [DuckDB installation guide](https://duckdb.org/docs/installation/).

```bash
curl -s https://install.motherduck.com | sh
```

Launch the DuckDB CLI:

```bash
duckdb
```

```sql
-- Connect to MotherDuck
ATTACH 'md:';

-- Load CSV data from your local file into the playground database
CREATE TABLE docs_playground.popular_currency_rate_dollar AS
SELECT * FROM './popular_currency_rate_dollar.csv';
```

### Python

Install DuckDB using your preferred package manager, such as pip:

```bash
pip install duckdb
```

```python
import duckdb

# Connect to MotherDuck
conn = duckdb.connect('md:')

# Load data into the playground database (automatically created)
conn.execute("""
    CREATE TABLE docs_playground.popular_currency_rate_dollar AS
    SELECT * FROM './popular_currency_rate_dollar.csv'
""")
```

### MotherDuck UI

Head over to the `Add data` button in the MotherDuck UI and upload your file directly. This works great for smaller files and provides a visual interface.

![Add file](./img/screenshot_add_data.png)

![load data](./img/screenshot_loading_data2.png)

### From remote storage (S3, GCS, etc.)

For data already stored in cloud storage, you have multiple options:

### SQL

You can load public remote data into your playground database using our interactive SQL editor:



### DuckDB CLI

```sql
ATTACH 'md:';
CREATE TABLE docs_playground.popular_currency_rate_dollar AS
SELECT * FROM 's3://us-prd-motherduck-open-datasets/misc/csv/popular_currency_rate_dollar.csv';
```

### Python

```python
import duckdb
conn = duckdb.connect('md:')
conn.execute("""
    CREATE TABLE docs_playground.popular_currency_rate_dollar AS
    SELECT * FROM 's3://your-bucket/your-file.csv'
""")
```

### MotherDuck UI

1. In the left panel of the UI, click **Add data**
2. Select **From cloud storage**
3. For a publicly accessible bucket, skip creating a secret
4. Switch to **Wildcard** mode, and enter the S3 path `s3://us-prd-motherduck-open-datasets/**/popular_currency_rate_dollar.csv`
5. Name the table `popular_currency_rate_dollar` and select `docs_playground` as the destination database
6. Click **Create table**

![Create table from S3](./img/screenshot_ui_create_table_from_s3.png)

For more details, see [Loading Data from Cloud Storage](../../key-tasks/loading-data-into-motherduck/loading-data-from-cloud-or-https.md).

:::info
For private AWS s3 buckets, you'll need to configure AWS credentials. Check our [AWS s3 authentication guide](../../integrations/cloud-storage/amazon-s3.mdx) for details.
:::

### Querying your data

Once your data is loaded, you can query it from any interface:

### SQL



### DuckDB CLI

```sql
ATTACH 'md:';
FROM docs_playground.popular_currency_rate_dollar LIMIT 10;
```

### Python

```python
import duckdb

# Connect to MotherDuck
conn = duckdb.connect('md:')

# Query your data
result = conn.sql("FROM docs_playground.popular_currency_rate_dollar LIMIT 10").fetchall()
print(result)
```

👉 **[Continue to Part 3: Sharing Your Database →](../part-3)**

---

Source: https://motherduck.com/docs/getting-started/data-warehouse

# Data Warehousing Overview
> Learn to use MotherDuck as a Data Warehouse
## Introduction to MotherDuck for data warehousing

MotherDuck is a cloud-native data warehouse built on top of [DuckDB](https://duckdb.org/docs/sql/introduction) that adds enterprise features like cloud storage, sharing, and collaboration to DuckDB's fast analytical engine. The platform serves these needs through its serverless architecture, sharing model, and WASM capabilities. It benefits data analysts with AI-assisted SQL, data engineers with familiar tools like dbt, and data scientists with hybrid local-cloud processing.

![img_duck_stack](img/bi_tool.svg)

MotherDuck integrates with popular data tools including [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/), [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) for data ingestion, [dbt](/docs/integrations/transformation/dbt) for transformations, [Tableau](/integrations/bi-tools/tableau/) and [PowerBI](/integrations/bi-tools/powerbi/) for visualization, and [Airflow](https://airflow.apache.org/docs/) and [Dagster](https://docs.dagster.io/integrations/libraries/duckdb/using-duckdb-with-dagster) for orchestration. This enables teams to build data warehousing solutions using their existing tools.

## Data ingestion

An easy way to get into MotherDuck is using [ecosystem partners](/integrations/ingestion/) like [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/), [Fivetran](https://fivetran.com/docs/destinations/motherduck), [dlthub](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) but you can also create custom data engineering pipelines.

MotherDuck is very flexible with how to load your data:

- **From data you have on your filesystem:** If you have CSVs, JSON files or DuckDB databases sitting around, It's easy to load it into your MotherDuck data warehouse.
- **From a data lake on a cloud object store:** If you already have your data in a data lake, as parquet, delta, iceberg or other formats, DuckDB has abstractions for Secrets, Object Storage, and many file types. When combined, this means that many file types can be read into DuckDB from Object Storage with only SQL. Though not as performant as MotherDuck's native storage layer, you can also query your infrequently-accessed data directly from your data lake with MotherDuck.
- **Using Native APIs in many languages:** DuckDB supports numerous languages such as C++, Python, and Java, in addition to its own mostly Postgres-compatible SQL dialect. Using these languages, Data Engineers and Developers can integrate with MotherDuck without having to pick up yet-another-language.

### Best practices for programmatic loading

The fastest way to load data is to load single tables in large batches, saturating the network connection between MotherDuck and the source data. DuckDB is incredibly good at handling both files and some kinds of in-memory objects, like Arrow dataframes. As an aside, Parquet files compress at 5-10x compared to CSV, which means you can get 5-10x more throughput by using Parquet files. Similarly, open table formats like Delta & Iceberg share those performance gains.

On the other hand, small writes on multiple tables will lead to suboptimal performance. While MotherDuck does indeed offer [ACID compliance](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid.html), it is not an OLTP system like Postgres! Significantly better performance can be achieved by using queues to batch writes to tables. While some latency is introduced with this methodology, the improvement in throughput should far outweigh the cost of doing small writes.

Streaming workloads are better suited to be handled with queues in front of MotherDuck.

## Transforming data

Once data is loaded into MotherDuck, it must be transformed into a model that matches the business purpose and needs. This can be done directly in MotherDuck using the powerful library of SQL functions offered by [DuckDB](https://duckdb.org/docs/sql/introduction.html). Many data engineers prefer to use data transformation tools like the open source [dbt Core](https://github.com/dbt-labs/dbt-core). More details specifically about using dbt with MotherDuck can be read in the [blog on this topic](https://motherduck.com/blog/duckdb-dbt-e2e-data-engineering-project-part-2/).

For more in-depth reading, the free **[DuckDB in Action eBook](https://motherduck.com/duckdb-book-brief/)** explores these concepts with real-world examples.

## Sharing data

Once your data is loaded into MotherDuck and appropriately transformed for use by your analysts, you can make that data available using MotherDuck's [sharing capabilities](/key-tasks/sharing-data/sharing-overview/). This can allow every user in your organization to access the data warehouse in the MotherDuck UI, in their Python code or with other tools. Admins don't need to worry that the queries run by users will impact their data pipelines as users have isolated compute.

## Serving data analytics

Do you want to serve reports or dashboards for your users? MotherDuck provides tokens that can be used with [popular tools](/integrations/bi-tools/) like Tableau & Power BI to access your data warehouse to serve business intelligence to end users.

### Ducks all the way down: Building data apps

MotherDuck is built on DuckDB because it is an extremely efficient SQL engine inside a ~20MB executable. This lets you run the same DuckDB engine which powers your data warehouse inside your web browser, creating highly-interactive visualizations with near-zero latency. This enhances your experience when using the [Column Explorer](/getting-started/interfaces/motherduck-quick-tour/#column-explorer) in the MotherDuck UI.

One thing that is unique to MotherDuck is its capabilities for serving data into the web layer through [WASM](/sql-reference/wasm-client). These capabilities enable novel analytical user actions, including very intensive queries that would be prohibitively expensive in other query engines. It also supports data mashup from various sources, so that data in the warehouse can be combined with other sources, like files in CSV, JSON, or Parquet.

## Scaling up & out for DWH use cases

Furthermore, MotherDuck has a unique scaling model, of which there are four key concepts relevant for Data Warehousing.

### Vertical scaling

Compute can scale up with larger DuckDB compute instances called Ducklings. MotherDuck offers 5 sizes: [Pulse, Standard, Jumbo, Mega, and Giga](/about-motherduck/billing/duckling-sizes/).

Unlike other data warehouses, every Duckling (compute instance) is isolated from each other: one user's queries will not impact another user's from completing. This [hypertenancy](/concepts/hypertenancy) model assures you can size your warehouse correctly and use your resources very efficiently.

### Horizontal scaling

For serving data to BI tools or other spiky consumers, [Read Scaling Replicas](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) can absorb the loads and maintain low latency on user interactivity. These should be owned by the same user or service accounts that run production jobs, although they can also leverage [`SHARES`](/key-tasks/sharing-data/sharing-overview/) depending on preferences.

### Hypertenancy

Especially for production runs, use separate user accounts or [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) with dedicated compute for updating and maintaining core tables.

### Distributed DuckDB

DuckDB and MotherDuck work together as a distributed system that automatically optimizes query execution between local and cloud resources through Dual Execution, enabling efficient data access regardless of location.

## Orchestration

To keep data up to date inside of MotherDuck, often an orchestrator like [Airflow](https://airflow.apache.org/) or [Dagster](https://dagster.io/) can be used. This runs jobs in specific orders to load & transform data, as well managing workflow and observability, which is necessary for handling more complex data engineering pipelines.

If this is your first data warehouse, you might consider starting with something as simple as [GitHub actions](https://github.com/features/actions) or cron jobs to orchestrate your data pipelines.

:::info
For a more in-depth guide, check out the [Data Warehousing Guide](/key-tasks/data-warehousing/)
:::



---

Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/go

# Go
> MotherDuck + GoLang
The MotherDuck integration with Go is no different than DuckDB. For more information, see [Go](https://duckdb.org/docs/stable/clients/go.html) in DuckDB Documentation.

---

Source: https://motherduck.com/docs/getting-started/sample-data-queries/hacker-news

# Hacker News
> Sample data from Hacker News stories to use for SQL querying of DuckDB and MotherDuck databases.
## Explore the data

Interactive dashboard built on the Hacker News sample dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/).



## About the dataset

[Hacker News](https://news.ycombinator.com/) is a social news website focusing on computer science and entrepreneurship. It is run by Y Combinator, a startup accelerator, and it's known for its minimalist interface. Users can post stories (such as links to articles), comment on them, and vote them up or down, affecting their visibility.

There are two ways to access the dataset:

- Through the `sample_data` database, which contains a sample of the data (from **January 2022** to **November 2022**). This database is automatically attached to every MotherDuck account.
- Through the `hacker_news` database, which contains the full dataset (from **2016** to **2025**).

To attach the full `hacker_news` database, you can use the following command:

:::note `aws-us-east-1` region only
The `hacker_news` database is only available for accounts in the `aws-us-east-1` region.
:::

```sql
ATTACH 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' AS hacker_news;
```

To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx)

## Example queries

### Most shared websites
This query returns the top domains being shared on Hacker News.

```sql
SELECT
    regexp_extract(url, 'http[s]?://([^/]+)/', 1) AS domain,
    count(*) AS count
FROM sample_data.hn.hacker_news
WHERE url IS NOT NULL AND regexp_extract(url, 'http[s]?://([^/]+)/', 1) != ''
GROUP BY domain
ORDER BY count DESC
LIMIT 20;
```

### Most commented stories each month
This query calculates the total number of comments for each story and identifies the most commented story of each month.

```sql
WITH ranked_stories AS (
    SELECT
        title,
        'https://news.ycombinator.com/item?id=' || id AS hn_url,
        descendants AS nb_comments,
        YEAR(timestamp) AS year,
        MONTH(timestamp) AS month,
        ROW_NUMBER()
            OVER (
                PARTITION BY YEAR(timestamp), MONTH(timestamp)
                ORDER BY descendants DESC
            )
        AS rn
    FROM sample_data.hn.hacker_news
    WHERE type = 'story'
)

SELECT
    year,
    month,
    title,
    hn_url,
    nb_comments
FROM ranked_stories
WHERE rn = 1
ORDER BY year, month;
```

### Most monthly voted stories
This query determines the most voted story for each month.

```sql
WITH ranked_stories AS (
    SELECT
        title,
        'https://news.ycombinator.com/item?id=' || id AS hn_url,
        score,
        YEAR(timestamp) AS year,
        MONTH(timestamp) AS month,
        ROW_NUMBER()
            OVER (PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY score DESC)
        AS rn
    FROM sample_data.hn.hacker_news
    WHERE type = 'story'
)

SELECT
    year,
    month,
    title,
    hn_url,
    score
FROM ranked_stories
WHERE rn = 1
ORDER BY year, month;
```

### Keyword analysis
This query counts the monthly mentions a the keyword (here `duckdb`) in the title or text of Hacker News posts, organized by year and month.

```sql
SELECT
    YEAR(timestamp) AS year,
    MONTH(timestamp) AS month,
    COUNT(*) AS keyword_mentions
FROM sample_data.hn.hacker_news
WHERE
    (title LIKE '%duckdb%' OR text LIKE '%duckdb%')
GROUP BY year, month
ORDER BY year ASC, month ASC;
```

## Schema

| column_name | column_type | null | key | default | extra |
|-------------|-------------|------|-----|---------|-------|
| title       | VARCHAR     | YES  |     |         |       |
| url         | VARCHAR     | YES  |     |         |       |
| text        | VARCHAR     | YES  |     |         |       |
| dead        | BOOLEAN     | YES  |     |         |       |
| by          | VARCHAR     | YES  |     |         |       |
| score       | BIGINT      | YES  |     |         |       |
| time        | BIGINT      | YES  |     |         |       |
| timestamp   | TIMESTAMP   | YES  |     |         |       |
| type        | VARCHAR     | YES  |     |         |       |
| id          | BIGINT      | YES  |     |         |       |
| parent      | BIGINT      | YES  |     |         |       |
| descendants | BIGINT      | YES  |     |         |       |
| ranking     | BIGINT      | YES  |     |         |       |
| deleted     | BOOLEAN     | YES  |     |         |       |

---

Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/choose-database

# Specify MotherDuck database
> Specify MotherDuck database
When you connect to MotherDuck you can specify a database name or omit the database name and connect to the default database.

- If you use `md:` without a database name, you connect to a default MotherDuck database called `my_db`.
- If you use `md:`, you connect to the `` database.

After you establish the connection, either the default database or the one you specify becomes the current database.

You can run the `USE` command to switch the current database, as shown in the following example.

```python
#list the current database
con.sql("SELECT current_database()").show()
# ('database1')

#switch the current database to database2
con.sql("USE database2")
```

To query a table in the current database, you can specify just the table name. To query a table in a different database, you can include the database name when you specify the table. You don't need to switch the current database. The following examples demonstrate each method.

```sql
#querying a table in the current database
con.sql("SELECT count(*) FROM mytable").show()

#querying a table in another database
con.sql("SELECT count(*) FROM another_db.another_table").show()
```

---

Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-3

# 3 - Sharing Your Database
> Learn how to share your databases and collaborate with your team
In this section, you'll learn how to share your databases with colleagues and collaborate effectively using MotherDuck's sharing features.

👈 **[Go back to Part 2: Loading Your Dataset](../part-2)**

## Creating and sharing your data

Let's create a table with sample data in your playground database, then share it with others. The `docs_playground` database is automatically created when you connect, so you can start experimenting right away!

First, let's populate your playground database with some currency exchange data:



## Sharing your database

With your database and sample data in place, you can share this dataset with others. MotherDuck shares create a point-in-time snapshot of your database that can be accessed by specified users or groups.

When creating a share, the most important parameters control **access scope**, **visibility**, and **update behavior**. By default, shares use `ACCESS ORGANIZATION` (only your organization members can access), `VISIBILITY DISCOVERABLE` (appears in your organization's shared database list), and `UPDATE MANUAL` (creates a static snapshot that doesn't auto-update).

The syntax to create a share visible to everyone in your Organization is `CREATE SHARE  from `.

### SQL



### MotherDuck UI

You can also create shares through the MotherDuck UI by clicking the dropdown menu next to your database and selecting the share option. This will open a window to configure your share settings.

![share 1](./img/screenshot_tutorial_share_1_2.png)
![share 2](./img/screenshot_tutorial_share_2_2.png)

Once created, all members of your organization will be able to view this share in the MotherDuck UI under "Shared with me".

Learn more about [sharing in MotherDuck](../../key-tasks/sharing-data/sharing-within-org.md).

## Understanding share configuration

When creating shares, you can control three key aspects: **who can access** the data, **how users discover** the share, and **when the data updates**. Each parameter has specific options that determine the sharing behavior.

### ACCESS - who can access the share

- **`ACCESS ORGANIZATION`** (default): Only members of your organization can access the share
- **`ACCESS UNRESTRICTED`**: All MotherDuck users in the same cloud region as your Organization can access the share
- **`ACCESS RESTRICTED`**: Only the share owner has initial access; additional users must be granted access through `GRANT` commands

### VISIBILITY - how users discover the share

- **`VISIBILITY DISCOVERABLE`** (default): The share appears in your organization's "Shared with me" section for easy discovery
- **`VISIBILITY HIDDEN`**: Share can only be accessed through a direct URL; not listed in any user interface

:::info Important Visibility Rules
- Organization and Restricted shares default to `DISCOVERABLE`
- Unrestricted shares can only be `HIDDEN`
- Hidden shares can only be used with `ACCESS RESTRICTED`
:::

### UPDATE - when share data updates

- **`UPDATE MANUAL`** (default): Share content only updates when you run `UPDATE SHARE` command
- **`UPDATE AUTOMATIC`**: Share automatically reflects database changes within ~5 minutes

### Example share configurations





## Querying shared data

After creating a share, authorized users can access the shared database in two ways: by using the share URL directly or by attaching it as a database alias:

```sql
-- Attach a shared database
ATTACH 'md:_share/docs_playground/b556630d-74f1-435c-9459-cfb87d349cb3' AS shared_currency;

-- Query the shared data
SELECT * FROM shared_currency.currency_rates
WHERE rate_to_usd < 1.0
ORDER BY rate_to_usd DESC;
```

## Managing Shares

You can also manage your existing shares:



## Going further

Now that you've mastered the basics, here are some next steps to explore:

- Learn about [MotherDuck's Dual Execution](/key-tasks/running-hybrid-queries/) feature
- Connect to your favorite BI tools: [Tableau](../../integrations/bi-tools/tableau/index.mdx), [Power BI](../../integrations/bi-tools/powerbi/index.mdx) and learn more about [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- Set up data pipelines with [dbt](../../integrations/transformation/dbt.md)
- Look at our [supported integrations](/integrations) to integrate with your data stack.

---

Source: https://motherduck.com/docs/getting-started/sample-data-queries/air-quality

# Air Quality
> Sample data from the WHO Ambient Air Quality Database to use with DuckDB and MotherDuck
## Explore the data

Interactive dashboard built on the WHO air quality dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/).



## About the dataset

The [WHO Ambient Air Quality Database](https://www.who.int/publications/m/item/who-ambient-air-quality-database-(update-2023)) (6th edition, released in **May 2023**) compiles annual mean concentrations of nitrogen dioxide (NO2) and particulate matter (PM10, PM2.5) from ground measurements across over 8600 human settlements in more than 120 countries. This data, updated every 2-3 years since **2011**, primarily represents city or town averages and is used to monitor the Sustainable Development Goal Indicator 11.6.2, Air quality in cities.

To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx)

## Example queries

### Annual city air quality rating

This query assesses the average annual air quality in different cities per year based on WHO guidelines. It calculates the average concentrations of PM2.5, PM10, and NO2, then assigns an air quality rating of 'Good', 'Moderate', or 'Poor'. 'Good' indicates all pollutants are within WHO recommended levels, 'Poor' indicates all pollutants exceed WHO recommended levels, and 'Moderate' refers to any other scenario. The results are grouped and ordered by city and year.

```sql
SELECT
    city,
    year,
    CASE
        WHEN
            AVG(pm25_concentration) <= 10
            AND AVG(pm10_concentration) <= 20
            AND AVG(no2_concentration) <= 40
            THEN 'Good'
        WHEN
            AVG(pm25_concentration) > 10
            AND AVG(pm10_concentration) > 20
            AND AVG(no2_concentration) > 40
            THEN 'Poor'
        ELSE 'Moderate'
    END AS airqualityrating
FROM
    sample_data.who.ambient_air_quality
GROUP BY
    city,
    year
ORDER BY
    city,
    year;
```

### Yearly average pollutant concentrations of a city

This query calculates the yearly average concentrations of PM2.5, PM10, and NO2 in a given city, here `Berlin`.

```sql
SELECT
    year,
    AVG(pm25_concentration) AS avg_pm25,
    AVG(pm10_concentration) AS avg_pm10,
    AVG(no2_concentration) AS avg_no2
FROM sample_data.who.ambient_air_quality
WHERE city = 'Berlin'
GROUP BY year
ORDER BY year DESC;
```

## Schema

|    column_name     | column_type | null | key | default | extra |
|--------------------|-------------|------|-----|---------|-------|
| who_region         | VARCHAR     | YES  |     |         |       |
| iso3               | VARCHAR     | YES  |     |         |       |
| country_name       | VARCHAR     | YES  |     |         |       |
| city               | VARCHAR     | YES  |     |         |       |
| year               | BIGINT      | YES  |     |         |       |
| version            | VARCHAR     | YES  |     |         |       |
| pm10_concentration | BIGINT      | YES  |     |         |       |
| pm25_concentration | BIGINT      | YES  |     |         |       |
| no2_concentration  | BIGINT      | YES  |     |         |       |
| pm10_tempcov       | BIGINT      | YES  |     |         |       |
| pm25_tempcov       | BIGINT      | YES  |     |         |       |
| no2_tempcov        | BIGINT      | YES  |     |         |       |
| type_of_stations   | VARCHAR     | YES  |     |         |       |
| reference          | VARCHAR     | YES  |     |         |       |
| web_link           | VARCHAR     | YES  |     |         |       |
| population         | VARCHAR     | YES  |     |         |       |
| population_source  | VARCHAR     | YES  |     |         |       |
| latitude           | FLOAT       | YES  |     |         |       |
| longitude          | FLOAT       | YES  |     |         |       |
| who_ms             | BIGINT      | YES  |     |         |       |

---

Source: https://motherduck.com/docs/getting-started/customer-facing-analytics

# Customer-Facing Analytics Overview
> Build customer-facing embedded analytics with MotherDuck. Per-user isolation, sub-second SQL dashboards, and white-label analytics for SaaS—no complex infrastructure needed.
Customer-facing analytics (CFA), or embedded analytics, has requirements that traditional data architectures rarely meet. If you're building SaaS analytics dashboards, white-label reporting, or embedded data visualizations, CFA demands sub-second response times, per-customer isolation, and integration with operational applications — all while serving many concurrent end users.

MotherDuck addresses these needs through two architectural capabilities:

- **[Hypertenancy](#1-hypertenancy)**: Each customer gets their own dedicated DuckDB instance (Duckling), providing full compute isolation (so no resource contention between users), predictable performance, and the ability to scale resources independently based on individual customer needs.

- **[Dual execution](#2-dual-execution-for-zero-latency-exploration)**: Enabled by DuckDB's lightweight architecture, queries can run both in the cloud and directly in the client's browser through WebAssembly, delivering near-instantaneous data exploration and filtering.

This guide explains how MotherDuck's architecture addresses the [core CFA challenges](#the-cfa-challenge) and provides [implementation patterns](#implementation-patterns) you can ship.

## What is customer-facing analytics?

**Customer-Facing Analytics (CFA)** embeds analytics directly into operational applications for external users—customers, partners, or end-users—rather than internal stakeholders. Traditional BI targets internal teams, runs on batch-processed data models, serves a small number of users, and tolerates higher-latency queries.

| Dimension      | Traditional BI                  | Customer-Facing Analytics            |
| -------------- | ------------------------------- | ------------------------------------ |
| **Audience**   | Internal (analysts, executives) | External (customers, partners)       |
| **Delivery**   | BI tools (Tableau, Looker)      | Embedded in application              |
| **Latency**    | Seconds to minutes acceptable   | Milliseconds to low seconds required |
| **Scale**      | Dozens to hundreds of users     | Thousands to millions of users       |
| **Isolation**  | Shared warehouse                | Per-customer isolation needed        |
| **Tech Stack** | Python, BI tools                | JavaScript, embedded SDKs            |

"Customer-facing analytics" and "embedded analytics" get used interchangeably. Both describe integrating analytical capabilities directly into a product instead of sending users to a separate BI tool. The difference is one of emphasis: customer-facing analytics focuses on the *audience* (your customers), while embedded analytics focuses on the *delivery* (built into your app). MotherDuck supports both.

### Common use cases

- **SaaS analytics dashboards:** give customers self-serve analytics within your product, covering usage metrics, performance KPIs, and ROI reporting
- **White-label analytics:** offer analytics under your brand that customers can explore without leaving your app
- **Embedded dashboards:** drop interactive charts and tables directly into your application UI
- **Multi-tenant reporting:** serve thousands of customers from one platform while keeping each tenant's data and compute isolated

:::info
**What about AI-driven analytics?**
AI-driven analytics enables natural language interactions with data, allowing users to ask conversational questions like "What were our top-selling products last quarter?" and get immediate answers. MotherDuck's [hypertenancy](/concepts/hypertenancy) and dual execution make it well-suited for building AI-driven analytics solutions. Learn how to [build analytics agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/).
:::

## The CFA challenge

Building customer-facing analytics systems presents three core challenges:

### Challenge 1: Technology stack mismatch

For many applications, the data sits in a transactional database (OLTP database) like Postgres or MySQL. Engineers building CFA features often run analytical queries directly in a multi-tenant transactional database, which works until it fails at scale. Row-based storage and transactional databases are not designed for efficient analytical querying.

![Crying Database](./img/crying_db.webp)

Operational applications often live in JavaScript/TypeScript, but traditional data tools are Python-centric. Operational teams work with OLTP databases built for transactions, while data teams use OLAP systems tuned for analytics but with their own challenges. Analytical workloads spike with user activity, while transactional loads need steady compute.

### Challenge 2: Latency requirements

Users expect sub-second response times—typical for OLTP systems. Anything slower degrades the application experience. Distributed OLAP systems (BigQuery, Snowflake, Databricks) often have cold starts and coordination overhead that keep them above those targets, even for small datasets.

Teams often add caching layers or refresh pipelines between OLTP and OLAP. That adds complexity, introduces another failure point, and delays data freshness.

### Challenge 3: Multi-tenancy at scale

Switching to an analytics engine is the first step. Many legacy OLAP engines were designed for internal analytics and are provisioned as a single instance or cluster for all customer data, leading to downstream complexities:

![Legacy Data Warehouse](./img/legacy_data_warehouse.png)

- **Overprovisioning**: Resources sized for peak load sit idle most of the time
- **Noisy neighbors**: Large customer impacts small customers
- **Resource contention**: Concurrency limits affect everyone
- **Unpredictable performance**: Query times vary based on load
- **Security concerns**: All customer data in one shared system

## Why MotherDuck for customer-facing analytics?

MotherDuck's architecture aligns with the requirements of Customer-Facing Analytics. Two architectural advantages set it apart:

### 1. Hypertenancy

MotherDuck provisions a Duckling (DuckDB instance) for each customer (or even for each customer's users). This [hypertenancy](/concepts/hypertenancy) model isolates customer data and delivers consistent DuckDB performance to each user.

![Happy Database](./img/happy_db.webp)

**Why single-node beats distributed compute clusters for CFA**

Traditional data warehouses use distributed computing with coordination overhead, data shuffling, and network latency. Even a fast query typically takes a second or more because of this overhead.

DuckDB and MotherDuck use single-node, optimized columnar execution:

- Zero network hops
- Zero coordination overhead
- Optimized vectorized execution

For CFA workloads that query one customer's data at a time, single-node execution is usually faster than distributed, and MotherDuck can reach **subsecond performance**.

#### Scaling analytics up and out

Each customer (and possibly each of their users) has their **own MotherDuck Duckling** (DuckDB instance). One account could run hundreds or thousands of Ducklings at a time, or none. This serverless model underpins MotherDuck's advantage versus other engines.

MotherDuck's **cold start time is ~1 second**, and **per-second billing** (1-second minimum) keeps individual queries cost-efficient.

:::note
While MotherDuck supports provisioning one Duckling per user, start simpler. Begin with a single Duckling and introduce per-user isolation and dedicated [read scaling tokens](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) as your user base grows beyond 100 users or when tighter performance guarantees are needed.
:::

![MD Router](./img/md_router.svg)

This isolated Duckling approach with vertical scaling delivers:

- **Perfect isolation**: No noisy neighbors
- **Predictable performance**: Dedicated resources per customer
- **Cost-effective**: Pay only for what each customer needs
- **Easy scaling**: Vertically scale individual ducklings as needed

Scale vertically by upgrading (or downgrading) the Duckling size your application uses for each customer, giving more power to higher-priority customers. If you need more compute or higher concurrency, launch [read scaling Ducklings](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for compute-hungry customers.

MotherDuck offers several [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for larger workloads.

For programmatic changes to user settings, refer to our [API docs](/sql-reference/rest-api/motherduck-rest-api/).

### White-label analytics

Many SaaS companies need analytics that look and feel native to their product. MotherDuck's architecture supports white-label analytics by design:

- **Per-customer isolation:** each tenant gets a dedicated Duckling, with no shared infrastructure leaking through
- **Flexible query layer:** use any frontend charting library (Recharts, D3, Observable Plot) with MotherDuck as the SQL backend
- **No vendor branding:** unlike embedded BI tools that surface their own UI, MotherDuck powers your queries behind the scenes
- **DuckDB-Wasm for client-side execution:** ship analytics that run entirely in the browser for maximum responsiveness

### 2. dual execution for zero-latency exploration

As you build Customer-Facing Analytics into your product, you need sub-second response times so customers can explore their data quickly. Distributed data warehouses rarely meet that bar.

Because MotherDuck is built on DuckDB, you can connect from any DuckDB client. DuckDB is an in-process database, so it **can run on your server (3-tier) or directly in the client's browser through WebAssembly (1.5-tier)**.

This enables "dual execution": combining local data and compute with cloud data and compute in a single query, giving you flexibility to optimize for performance and cost.

**Traditional approach has multiple network hops:**

```mermaid
flowchart LR
    subgraph Client Side
        User{{"USER"}}:::green
        Browser["CLIENT (Browser)"]
    end

    subgraph Server Side
        Server["SERVER"]:::watermelon
        Database[("DATABASE")]:::yellow
    end

    User --> Browser
    Browser --> Server
    Server --> Database
```

**DuckDB-Wasm enables client-side execution:**

```mermaid
flowchart LR
    subgraph Client Side
        User{{"USER"}}:::green
        subgraph Browser["CLIENT (Browser)"]
            LocalDB[("DATABASE")]:::database
        end
    end

    subgraph Server Side
        CloudDB[("DATABASE")]:::database
    end

    User --> Browser
    Browser --> CloudDB
```

Because the same DuckDB SQL engine runs on both MotherDuck Ducklings and on your customers' machines, you can offload data processing to their laptops and provide fast data exploration, filtering, and sorting using SQL. Customers do not need to install anything because DuckDB runs inside the web browser using WebAssembly (Wasm).

You can see this experience in [Column Explorer](/getting-started/interfaces/motherduck-quick-tour/) and [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/) in the MotherDuck UI. Here's a teaser of it in action:

![Instant SQL](./img/fast_queries.gif)

## Implementation patterns

MotherDuck enables two distinct architectural patterns for customer-facing analytics:

### 3-tier architecture

**Best for:** Applications requiring server-side authorization, business logic, or deployments to stateful platforms.

**Typical web application architecture:**

```mermaid
flowchart LR
    Frontend["Browser (React Frontend)"]
    Backend["Application Server (Express / FastAPI)"]
    MotherDuck[("MotherDuck (Cloud Database)")]:::yellow

    Frontend -->|"API Requests"| Backend
    Backend -->|"Persistent Connection, SQL Queries"| MotherDuck
```

**Key Benefits:**

- Persistent database connection (connection pooling saves ~200ms per request)
- Fast query performance (~50-100ms)
- Server-side security and authorization
- Works with any DuckDB client (Node.js, Python, Go, Rust, Java)

**Performance optimizations:**

1. Intermediate table results: Pre-aggregate data on MotherDuck for faster queries
2. Prefer one well-structured SQL statement that returns all needed metrics (using SELECT with multiple aggregates, CASE/FILTER, or UNION ALL).
3. For multi-step workflows, wrap statements in a BEGIN … COMMIT transaction to ensure atomicity.
4. For data movement, use bulk operations (COPY, INSERT … SELECT) instead of many row-by-row calls.
5. Application Caching: Cache rarely-changing data on your server to avoid any extra queries on MotherDuck

**When to use:**

- You need server-side authorization and business logic
- You want a traditional, battle-tested architecture
- You're deploying to stateful services (Cloud Run, ECS, Kubernetes)
- Your team works with multiple languages



### 1.5-tier architecture (DuckDB-Wasm)

**Best for:** Read-heavy dashboards with `<1GB` data per user where you need maximum performance. This works well for embedded dashboards with interactive charts, tables, and filters that respond in under 10ms because queries execute locally in the user's browser.

**Architecture:**

```mermaid
flowchart LR
    Browser["Browser
(React + MotherDuck Wasm SDK)"] MotherDuck[("MotherDuck
(Cloud Database)")]:::yellow Browser -->|"Initial data fetch
Query execution"| MotherDuck ``` **Key Benefits:** - Sub-10ms query latency (queries run locally in browser) - Near-zero server costs (just data transfer) - Offline support after initial data load - Infinite scalability (users provide compute) **Performance optimizations:** 1. **Optimize Initial Load**: Use Parquet compression, limit to `<50MB` 2. **IndexedDB Persistence**: Data survives page reloads 3. **Incremental Sync**: Only fetch new data since last sync **When to use:** - Read-heavy dashboards with frequent filtering/drilling - Want `<10ms` query latency - Data per user is `<1GB` - Want to minimize server costs :::info WebAssembly applications using multi-threading (including DuckDB-Wasm) require cross-origin isolation. This means your page must be served with specific headers (`Cross-Origin-Embedder-Policy: require-corp` and `Cross-Origin-Opener-Policy: same-origin`), and resources from different origins must include a `Cross-Origin-Resource-Policy: cross-origin` header. If you're building a new application, a dedicated page is easier to manage within these constraints. If you have existing dependencies (iframes, third-party scripts, etc.) and need to integrate analytics into an existing page, the 3-tier architecture is recommended. ::: #### Hands-on example See our [1.5-tier architecture example](https://github.com/motherduckdb/wasm-client/tree/main/examples/nypd-complaints) demonstrating best practices for building a 1.5-tier analytics application using TypeScript, React and the MotherDuck Wasm SDK. ### 3-Tier vs 1.5-Tier | Factor | 3-Tier | 1.5-Tier (DuckDB-Wasm) | | --------------------- | ------------------- | ---------------------- | | **Query latency** | ~50-100ms | ~5-20ms ⚡ | | **Server cost** | $$ (per request) | $ (data transfer only) | | **Scalability** | High (auto-scaling) | ♾️ Unlimited | | **Data per user** | Any size | `<1GB` optimal | | **Offline support** | ❌ No | ✅ Yes | | **Server-side logic** | ✅ Yes | ❌ Limited | | **Best for** | Complex logic, auth | Read-heavy dashboards | ### Additional resources - [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/) - [Read Scaling Ducklings](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) - [Duckling Sizes](/about-motherduck/billing/duckling-sizes/) ## FAQ ### What is embedded analytics? Embedded analytics means putting data visualizations, dashboards, and interactive reports directly inside a software application. Users explore data in the product they already use rather than switching to a separate BI tool. MotherDuck powers embedded analytics with sub-second SQL queries and per-user compute isolation. ### What is the difference between embedded analytics and traditional BI? Traditional BI is built for internal teams using standalone tools like Tableau or Looker. Embedded analytics is for your external customers, living inside your product. That difference creates harder technical requirements: you need lower latency, higher concurrency (potentially thousands of simultaneous users), and per-tenant data isolation. MotherDuck's Duckling architecture handles all three. ### What is white-label analytics? White-label analytics lets you offer data analytics under your own brand. Your customers see dashboards that match your product's look and feel, with no third-party logos visible. MotherDuck supports this by providing a SQL query engine (DuckDB) that runs behind your UI — there's no user-facing vendor footprint. ### How do you add analytics to a SaaS product? Two main approaches. In a 3-tier architecture, your server queries MotherDuck and returns results to the frontend. This works well when you have complex auth or business logic. In a 1.5-tier architecture, DuckDB runs directly in the browser through WebAssembly, which is a better fit for read-heavy dashboards where each user's data stays under 1GB. Both approaches give you fast query performance. ### What is multi-tenant analytics? Multi-tenant analytics means serving multiple customers from one shared platform while keeping each customer's data separate. MotherDuck works differently, through Hypertenancy — every tenant gets a dedicated DuckDB instance (a Duckling). This avoids noisy-neighbor problems and keeps performance predictable while maintaining data isolation between each customer. --- Source: https://motherduck.com/docs/getting-started/interfaces/connect-query-from-duckdb-cli # DuckDB CLI > Learn to connect and query databases using MotherDuck from the DuckDB CLI ## Installation :::note MotherDuck supports DuckDB 1.5.3. In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions 1.4.0 through 1.5.3. In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions 1.4.1 through 1.5.3. In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions 1.4.1 through 1.5.3. ::: Download and install the DuckDB binary, depending on your operating system. ### Windows 1. Download the 64-bit Windows binary [here](https://github.com/duckdb/duckdb/releases/download/v1.5.3/duckdb_cli-windows-amd64.zip) 2. Extract the zip file. ### macOS The recommended way to install the CLI is with the MotherDuck install script: ### Install with bash ```bash curl -s https://install.motherduck.com | sh ``` ### Linux 1. Download the Linux binary: - For 64-bit, download the binary [here](https://github.com/duckdb/duckdb/releases/download/v1.5.3/duckdb_cli-linux-amd64.zip) - For arm64/aarch64, download the binary [here](https://github.com/duckdb/duckdb/releases/download/v1.5.3/duckdb_cli-linux-aarch64.zip) 2. Extract the zip file. For more information, see the [DuckDB installation documentation](https://duckdb.org/docs/installation/). ## Try it Walk through starting DuckDB, attaching MotherDuck, and running your first query in the playground below. Each step explains what happens before you press Enter, so you can preview the full flow before running it on your machine. ## Step by step ### Start the DuckDB CLI After installing, start DuckDB from your terminal: ```sh duckdb ``` DuckDB opens an in-memory database by default, so any tables you create won't persist when you exit. Pass a filename to open or create a persistent local database: ```sh duckdb mydatabase.duckdb ``` ### Connect to MotherDuck From inside the DuckDB CLI, attach MotherDuck: ```sql ATTACH 'md:'; ``` DuckDB downloads the signed MotherDuck extension and opens your default browser to authenticate. Follow the instructions in the terminal. To list your MotherDuck databases and confirm the connection, run: ```sql SHOW DATABASES; ``` You can query local DuckDB data and MotherDuck databases from the same session. For more on persisting your authentication credentials, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). :::tip You can also connect to MotherDuck directly when starting DuckDB: ```bash duckdb "md:" ``` ::: :::note Manual extension update When MotherDuck releases a new extension version you can force-reinstall the extension from the CLI. ```sh FORCE INSTALL motherduck; ``` ::: ### Open the MotherDuck UI from the CLI Launch the MotherDuck UI from your terminal: ```bash duckdb -ui ``` If you're already in a DuckDB session, run `CALL start_ui();` instead. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/java # Java (JDBC) > MotherDuck + Java The MotherDuck integration with Java is no different than DuckDB. For more information, see [Java](https://duckdb.org/docs/stable/clients/java.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/kaggle-movies # Kaggle Movies > A dataset of over 40,000 movies with titles, overviews, and pre-computed embeddings for semantic search. ## Explore the data Interactive dashboard with semantic search on the Kaggle Movies sample dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset This dataset is a subset of the [Kaggle Movies Dataset](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset), containing over 40,000 movie titles and overviews. It also includes pre-computed 512-dimensional vector embeddings (generated with OpenAI's `text-embedding-3-small` model) for both the title and overview fields, making it useful for experimenting with [semantic search](/key-tasks/ai-and-motherduck/text-search-in-motherduck/) in MotherDuck. ## How to query the dataset This dataset is available as part of the `sample_data` database, which is automatically attached to every MotherDuck account. ## Example queries ### Browse movies ```sql SELECT title, overview FROM sample_data.kaggle.movies LIMIT 10; ``` ### Find similar movies using vector search Use the pre-computed embeddings together with the [`embedding`](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) function to find movies similar to a search query: ```sql SELECT title, overview, array_cosine_similarity( overview_embeddings, embedding('a space adventure with aliens') ) AS similarity FROM sample_data.kaggle.movies WHERE overview IS NOT NULL ORDER BY similarity DESC LIMIT 10; ``` ### Find movies similar to another movie ```sql WITH target AS ( SELECT overview_embeddings FROM sample_data.kaggle.movies WHERE title = 'The Matrix' LIMIT 1 ) SELECT m.title, m.overview, array_cosine_similarity(m.overview_embeddings, t.overview_embeddings) AS similarity FROM sample_data.kaggle.movies m, target t WHERE m.title != 'The Matrix' ORDER BY similarity DESC LIMIT 10; ``` ## Schema | Column Name | Column Type | Description | |-----------------------|-------------|-----------------------------------------------------------------| | title | VARCHAR | Movie title | | overview | VARCHAR | Short description or synopsis of the movie | | title_embeddings | FLOAT[512] | Pre-computed vector embedding of the title | | overview_embeddings | FLOAT[512] | Pre-computed vector embedding of the overview | --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/loading-data-into-md # Loading data into MotherDuck with Python > Load CSV, Parquet, and JSON files into MotherDuck from local, S3, or HTTPS sources using Python. ## Copying a table from a local DuckDB database into MotherDuck You can use `CREATE TABLE AS SELECT` to load CSV, Parquet, and JSON files into MotherDuck from either local, Amazon S3, or https sources as shown in the following examples. ```python # load from local machine into table mytable of the current/active used database con.sql("CREATE TABLE mytable AS SELECT * FROM '~/filepath.csv'"); # load from an S3 bucket into table mytable of the current/active database con.sql("CREATE TABLE mytable AS SELECT * FROM 's3://bucket/path/*.parquet'") ``` If the source data matches the table’s schema exactly you can also use `INSERT INTO ... SELECT` to append data, as shown in the following example. ```python # append to table mytable in the currently selected database from S3 con.sql("INSERT INTO mytable SELECT * FROM ‘s3://bucket/path/*.parquet’") ``` :::tip Use `INSERT INTO ... SELECT` to load data from files as shown above. Do not use single-row `INSERT INTO ... VALUES` statements in a loop — this is significantly slower because each statement incurs separate network overhead. See [Loading data best practices](/key-tasks/loading-data-into-motherduck/considerations-for-loading-data/) for more detail. ::: ## Copying an entire local DuckDB database to MotherDuck MotherDuck supports copying your opened DuckDB database into a MotherDuck database. The following example copies a local DuckDB database named `localdb` into a MotherDuck-hosted database named `clouddb`. ```python # open the local db local_con = duckdb.connect("localdb.ddb") # connect to MotherDuck local_con.sql("ATTACH 'md:'") # The from indicates the file to upload. An empty path indicates the current database local_con.sql("CREATE DATABASE clouddb FROM CURRENT_DATABASE()") ``` A local DuckDB database can also be copied by its file path: ```sql local_con = duckdb.connect("md:") local_con.sql("CREATE DATABASE clouddb FROM 'localdb.ddb'") ``` See [Loading Data into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) for more detail. --- Source: https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour # MotherDuck Web UI > A guide to the MotherDuck Web UI — write SQL with Instant SQL, use AI to fix and edit queries, and explore your data interactively. ## Getting started To log in to the MotherDuck UI, go to [app.motherduck.com](https://app.motherduck.com/). :::info You can also open the web UI directly from the DuckDB CLI: ```bash duckdb "md:" -ui ``` ::: ### Main window The MotherDuck UI is organized around a notebook-style editor with a database browser on the left and results inspection on the right. ![UI](../img/screenshot_ui.png) ## Instant SQL: write SQL with real time feedback **Instant SQL** gives you keystroke-fast query previews — results update as you type, with no run button needed. Under the hood, MotherDuck uses [dual execution](/concepts/architecture-and-capabilities/#dual-execution) to parse and run your query locally first, giving you immediate feedback while full cloud results load in the background. A caching indicator in the cell header shows when results are served from local cache. {/* TODO: Replace with new "Instant SQL" video once recorded — see .context/video-outlines.md for shot list */} ### Enabling instant SQL Toggle Instant SQL on or off per cell using: - The **Instant SQL toggle** in the cell header - The keyboard shortcut `Ctrl`/`⌘` + `Shift` + `.` ### What works with instant SQL - **Filtering in real time:** Add or change a `WHERE` clause and watch results narrow instantly. - **Multi-statement cells:** Click on any individual statement within a multi-statement cell to preview just that one. - **Window functions:** Window functions are fully supported in Instant SQL previews. ## Fix errors and edit queries with AI MotherDuck's AI features help you fix broken queries, rewrite SQL in plain English, and generate queries from scratch — all without leaving the editor. ### "Help me fix this broken query" — FixIt When you run a query that has an error, **FixIt** automatically analyzes the error and suggests an inline fix. Click to accept and re-run in one step. {/* TODO: Replace with new "Fix errors and edit SQL with AI" video once recorded — see .context/video-outlines.md for shot list */} By default, FixIt auto-suggests fixes whenever an error occurs. You can turn off auto-suggest and still trigger FixIt manually by clicking **Suggest fix** at the bottom of any error message. ![FixIt manual trigger](../../key-tasks/img/fixit-manual-suggestion.png) Toggle auto-suggest in **Settings → Preferences → Enable inline SQL error fix suggestions**. :::tip Free for all users FixIt is available on all plans, including the free tier. ::: ### "Modify my SQL using plain english" — edit Select text in your query (or place your cursor anywhere) and press `Ctrl`/`⌘` + `Shift` + `E` to open the **Edit** dialog. Describe what you want to change in natural language: ![Edit prompt](../../key-tasks/img/edit-prompt.png) Review the suggestion, then iterate with follow-up prompts if needed: ![Edit follow-up](../../key-tasks/img/edit-follow-up.png) When you're happy with the result, click **Apply edit** to update your query. ![Edit applied](../../key-tasks/img/edit-follow-up-2.png) ### Going further with SQL assistant functions For programmatic AI access (text-to-SQL, query explanation, schema understanding), see the [SQL Assistant functions](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/) reference. These are available in any DuckDB client connected to MotherDuck, not just the web UI. ## Explore your results ### Interactive data grid Query results load into an interactive data grid where you can sort, filter, and pivot without writing more SQL. Click the **Expand** button at the top right of any cell to go full-screen on the editor and results. ![Expand cells](../img/screenshot_expand_cells.png) ### Column explorer The Column Explorer shows statistics for every column in a table or result set — value frequencies, NULL percentages, histograms for numeric columns, and time-series charts for timestamp columns. {/* TODO: Replace with new "Explore your data" video once recorded — see .context/video-outlines.md for shot list */} Toggle the Column Explorer with `Ctrl`/`⌘` + `I` or the toggle button at the top right of the results panel. ### Cell content pane Click any cell in the results grid to see its full contents in the Cell Content Pane. ![Cell content — long text](../img/cell_content_long_text.png) For JSON columns, you can expand and collapse nodes, copy the value, or copy the keypath to any nested field. ![Cell content — JSON](../img/cell_content_json.png) ## Write queries faster ### Autocomplete Autocomplete suggests SQL syntax, table names, column names, and functions as you type. Turn it off in **Settings → Preferences → Enable autocomplete when typing**. ### Inline docs Hover over any SQL function in the editor to see its description, parameter types, and return type. Click the **Docs** link in the tooltip to open the full reference. ![Image](useBaseUrl('/img/getting-started/ui-inline-docs.png')) Turn off Inline Docs in **Settings → Preferences → Enable Inline Docs**. ### Format SQL Press `Ctrl`/`⌘` + `Alt`/`⌥` + `O` to auto-format the SQL in your current cell. When text is selected, only the selection is formatted. ## Navigate the workspace ### Object explorer Browse your databases, schemas, and tables in the left-hand panel. Toggle it with `Ctrl`/`⌘` + `B`. ### Command menu Press `Ctrl`/`⌘` + `K` to open the command menu for quick access to actions, notebooks, and settings. ### Notebook and worksheet views Toggle between notebook view (multiple cells) and worksheet view (single expanded cell) with `Ctrl`/`⌘` + `E`. ### Running queries The Running Queries page, found under **Settings** → **Running Queries**, lets you monitor and manage long-running queries on your Duckling. For each query, you can see: - **Query**: The SQL text of the query (click to expand the full statement). - **Status**: Whether the query is active or has completed. - **Start time**: When the query started executing. - **Elapsed time**: How long the query has been running. This is useful for identifying queries that are taking longer than expected. You can cancel a running query directly from this page. For programmatic access to active connections and query cancellation through SQL, see [`md_active_server_connections()`](/sql-reference/motherduck-sql-reference/connection-management/monitor-connections/) and [`md_interrupt_server_connection()`](/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections/). For a broader view of query activity across your organization, see the [`RECENT_QUERIES`](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) and [`QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) views. ### Duckling overview The Duckling overview page, found under **Settings** → **Duckling overview**, gives organization admins an at-a-glance view of activity across every Duckling in the organization over the last 24 hours. For each Duckling, you can see: - **Account**: The MotherDuck user or service account the Duckling belongs to. - **Status**: Whether the Duckling is running normally or has encountered errors. - **Spills**: Whether queries on this Duckling spilled to disk, which indicates memory pressure from larger-than-memory workloads. - **Active minutes**: How long the Duckling was actively running queries over the last 24 hours. Click a Duckling row to drill in. A bar chart visualizes query activity over time, and a table below lists individual queries. Click a query to open a side panel with the full SQL text, or open a dedicated focus page for a single query. ![Duckling overview drill-down showing summary stats, a query activity bar chart, and a table of top queries](../img/duckling-overview-drilldown.png) Use the timezone toggle in the page header to switch between UTC and your local time. This page is admin-only and is built on the [`QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) view, so it has the same ingestion delay — queries from the last few seconds may not appear yet. For a programmable view of the same data, or a more real-time view of ongoing queries, see the [`QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) and [`RECENT_QUERIES`](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) views. ## Keyboard shortcuts Use `Ctrl` for Windows/Linux and `⌘` (Command) for Mac. Use `Alt` for Windows/Linux and `⌥` (Option) for Mac. ### Running queries | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `Enter` | Run the current cell. | | `Ctrl`/`⌘` + `Shift` + `Enter` | Run selected text in the current cell. If no text is selected, run the whole cell. | | `Shift` + `Enter` or `Alt`/`⌥` + `Enter` | Run the current cell, then advance to the next cell (creates a new one if needed). | ### Editing | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `z` | Undo within current cell. | | `Ctrl`/`⌘` + `Shift` + `z` | Redo within current cell. | | `Ctrl`/`⌘` + `Alt`/`⌥` + `o` | Format SQL in the current cell (or selection). | | `Ctrl`/`⌘` + `/` | Toggle line comments (`--`). | | `Tab` | Indent current line (in editor). | | `Shift` + `Tab` | De-indent current line (in editor). | ### AI features | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `Shift` + `.` | Toggle [Instant SQL](#instant-sql-write-sql-with-real-time-feedback) on/off for the active cell. | | `Ctrl`/`⌘` + `Shift` + `e` | Open [Edit](#modify-my-sql-using-plain-english--edit) for your current cell or selected text. | ### Navigation and layout | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `k` | Open the command menu. | | `Ctrl`/`⌘` + `/` | Search notebooks, databases and more. | | `Ctrl`/`⌘` + `b` | Toggle the Object Explorer (left panel). | | `Ctrl`/`⌘` + `i` | Toggle the Column Explorer (right panel). | | `Ctrl`/`⌘` + `e` | Toggle notebook/worksheet view for the active cell. | | `Ctrl`/`⌘` + `↑` | Move current cell up. | | `Ctrl`/`⌘` + `↓` | Move current cell down. | | `Esc` | Switch `Tab` to UI navigation mode (reverts on next cell selection). | ## Settings Settings are found by clicking your profile at the top-left. | Section | Setting | Description | |---------|---------|-------------| | **Organization** | Details | Change the display name of the organization. Enable all users in your email domain to join. See [Managing Organizations](/key-tasks/managing-organizations). | | | Plans | View your current plan (Free, Standard) and switch plans. | | | Members | View and invite members to your organization. Members include human users and [service accounts](/key-tasks/service-accounts-guide/). | | **My Account** | Preferences | Enable [autocomplete](#autocomplete), inline [SQL error fix suggestions](#help-me-fix-this-broken-query--fixit) (FixIt), and [Inline Docs](#inline-docs). | | | Notifications | Configure notification preferences. | | | Ducklings | Manage [Duckling sizes](/about-motherduck/billing/duckling-sizes/#duckling-sizes), [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) pool size, version information, and Duckling reset for troubleshooting. | | **Integrations** | Access Tokens | Create tokens for programmatically [authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Tokens can have expiry dates. | | | Secrets | Store credentials for [AWS S3](/integrations/cloud-storage/amazon-s3), [Azure Blob Storage](/integrations/cloud-storage/azure-blob-storage), [Google Cloud Storage](/integrations/cloud-storage/google-cloud-storage), Cloudflare R2, and Hugging Face. | | **Monitor** | Running Queries | View and manage active queries. | | | Duckling overview | (Admin-only) View activity across every Duckling in your organization over the last 24 hours. See [Duckling overview](#duckling-overview). | | **Data** | Databases | Browse and manage your databases. | | | Shares | View and manage [shared databases](/key-tasks/sharing-data/). | | **Content** | Dives | Manage your saved [Dives](/key-tasks/ai-and-motherduck/dives/). | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/foursquare # Foursquare > Foursquare Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI) ## Explore the data Interactive dashboard built on the Foursquare Open Source Places dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset [Foursquare](https://docs.foursquare.com/data-products/docs/fsq-places-open-source) Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI), featuring 22 core attributes, updated monthly, and designed to support geospatial applications with a collaborative, AI- and human-powered data curation system. This database is updated monthly, we host however a snapshot of 2025-01-10. You have two tables : - `fsq_os_places` (Places) : a global dataset of over 100 million points of interest (POIs) with detailed location, business, and contact information. - `fsq_os_categories` (Categories) : a hierarchical classification of POIs with up to six levels, detailing category names and IDs. :::note `aws-us-east-1` region only This database is only available for accounts in the `aws-us-east-1` region. ::: You can attach the `foursquare` database to your account by running the following command: ```sql ATTACH 'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b' AS foursquare; ``` ## Example queries The following queries assume that the current database connected is `foursquare`. Run `use foursquare` to switch to it. ### Countries with the most places ```sql SELECT country, COUNT(*) AS places FROM fsq_os_places GROUP BY country ORDER BY places DESC LIMIT 10; ``` ## Schema ### fsq_os_places - places dataset | Column Name | Type | Description | |--------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | fsq_place_id | String | The unique identifier of a Foursquare POI. Use this ID to view a venue at: `foursquare.com/v/{fsq_place_id}ud` | | name | String | Business name of a POI | | latitude/longitude | Decimal | Decimal coordinates (WGS84 datum) up to 6 decimal places. Derived from third-party sources, user input, and corrections. Default geocode type: front door or rooftop. | | address | String | User-entered street address of the venue | | locality | String | City, town, or equivalent where the POI is located | | region | String | State, province, or territory. Abbreviations used in US, CA, AU, BR; full names elsewhere | | postcode | String | Postal code or equivalent, formatted based on country (e.g., 5-digit US ZIP code) | | admin_region | String | Additional sub-division (e.g., Scotland) | | post_town | String | Town/place used in postal addressing (may differ from geographic location) | | po_box | String | Post Office Box | | country | String | 2-letter ISO Country Code | | date_created | Date | Date the POI entered the database (not necessarily the opening date) | | date_refreshed | Date | Last date any reference was refreshed through crawl, users, or validation | | date_closed | Date | Date the POI was marked closed in the database (not necessarily actual closure date) | | tel | String | Telephone number with local formatting | | website | String | URL to the POI’s (or chain’s) website | | email | String | Primary contact email address, if available | | facebook_id | String | POI's Facebook ID, if available | | instagram | String | POI's Instagram handle, if available | | twitter | String | POI's Twitter handle, if available | | fsq_category_ids | Array (String) | ID(s) of the most granular category(ies). See the Categories page for details | | fsq_category_labels| Array (String) | Label(s) of the most granular category(ies). See the Categories page for details | | placemaker_url | String | Link to the POI’s review page in PlaceMaker Tools for suggesting edits or reviewing pending changes | | geom | wkb | Geometry of the POI in WKB format for visualization through the vector tiling service | | bbox | struct | An area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0. `bbox:struct xmin:double ymin:double xmax:double ymax:double` | --- ### fsq_os_categories - category dataset | Column Name | Type | Description | |----------------------|---------|-----------------------------------------------------------------------------------------------------| | category_id | String | Unique identifier of the Foursquare category (BSON format) | | category_level | Integer | Hierarchy depth of the category (1-6) | | category_name | String | Name of the most granular category | | category_label | String | Full category hierarchy separated by `>` | | level1_category_id | String | Unique ID of the first-level category | | level1_category_name | String | Name of the first-level category | | level2_category_id | String | Unique ID of the second-level category | | level2_category_name | String | Name of the second-level category | | level3_category_id | String | Unique ID of the third-level category | | level3_category_name | String | Name of the third-level category | | level4_category_id | String | Unique ID of the fourth-level category | | level4_category_name | String | Name of the fourth-level category | | level5_category_id | String | Unique ID of the fifth-level category | | level5_category_name | String | Name of the fifth-level category | | level6_category_id | String | Unique ID of the sixth-level category | | level6_category_name | String | Name of the sixth-level category | --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/nodejs # Node.js > MotherDuck + Node.js The MotherDuck integration with Node.js uses the `@duckdb/node-api` package. For more information, see [Node.js (Neo)](https://duckdb.org/docs/stable/clients/node_neo/overview.html) in DuckDB Documentation. This package replaces the deprecated `duckdb` npm package. --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/nyc-311-data # NYC 311 Complaint Data > New York City provides data from 311 call service requests. This data can be used as sample data for DuckDB and MotherDuck SQL queries. ## Explore the data Interactive dashboards built on the NYC sample datasets. Use them as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset The [New York City 311 Service Requests Data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9) provides information on requests to the city's complaint service from 2010 to the present. NYC311 responds to thousands of inquiries, comments and requests from customers every single day. This dataset represents only service requests that can be directed to specific agencies. This dataset is updated daily and expected values for many fields will change over time. The lists of expected values associated with each column are not exhaustive. Each row of data contains information about the service request, including complaint type, responding agency, and geographic location. However the data does not reveal any personally identifying information about the customer who made the request. This dataset describes site-specific non-emergency complaints (also known as “service requests”) made by customers across New York City about a variety of topics, including noise, sanitation, and street quality. To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### The most common complaints in 2018 ```sql SELECT UPPER(complaint_type), COUNT(1) FROM sample_data.nyc.service_requests WHERE DATE_PART('year', created_date) = 2018 GROUP BY 1 HAVING COUNT(*) > 1000 ORDER BY 2 DESC; ``` ## Schema The columns have been renamed to `lower_case_underscore` format for ease of typing. For more details on column data than below, see the associated data dictionary at that link above, in an Excel file. | column_name | column_type | null | description | |--------------------------------|---------------|--------|-------------| | unique_key | BIGINT | YES | Unique identifier of a Service Request (SR) in the open data set. Each 311 service request is assigned a number that distinguishes it as a separate case incident. | | created_date | TIMESTAMP | YES | The date and time that a Customer submits a Service Request. | | closed_date | TIMESTAMP | YES | The date and time that an Agency closes a Service Request. | | agency | VARCHAR | YES | Acronym of responding City Government Agency or entity responding to 311 Service Request. | | agency_name | VARCHAR | YES | Full agency name of responding City Government Agency, or entity responding to 311 service request. | | complaint_type | VARCHAR | YES | This is the first level of a hierarchy identifying the topic of the incident or condition. Complaint Type broadly describes the topic of the incident or condition and are defined by the responding agencies. | | descriptor | VARCHAR | YES | This is associated to the Complaint Type, and provides further detail on the incident or condition. Descriptor values are dependent on the Complaint Type, and are not always required in the service request. | | location_type | VARCHAR | YES | Describes the type of location used in the address information | | incident_zip | VARCHAR | YES | Zip code of the incident address | | incident_address | VARCHAR | YES | House number and street name of incident address | | street_name | VARCHAR | YES | Street name of incident address | | cross_street_1 | VARCHAR | YES | First Cross street based on the geo validated incident location.| | cross_street_2 | VARCHAR | YES | Second Cross Street based on the geo validated incident location | | intersection_street_1 | VARCHAR | YES | First intersecting street based on geo validated incident location | | intersection_street_2 | VARCHAR | YES | Second intersecting street based on geo validated incident location | | address_type | VARCHAR | YES | Type of information available about the incident location: Address; Block face; Intersection; LatLong; Placename | | city | VARCHAR | YES | In this dataset, City can refer to a borough or neighborhood. MANHATTAN, BROOKLYN, BRONX, STATEN ISLAND, or in QUEENS, specific neighborhood name | | landmark | VARCHAR | YES | If the incident location is identified as a Landmark the name of the landmark will display here. Can refer to any noteworthy location, including but not limited to, parks, hospitals, airports, sports facilities, performance spaces, etc. | | facility_type | VARCHAR | YES | If applicable, this field describes the type of city facility associated to the service request: DSNY Garage, Precinct, School, School District, N/A | | status | VARCHAR | YES | Current status of the service request submitted: Assigned, Canceled, Closed, Pending | | due_date | TIMESTAMP | YES | Date when responding agency is expected to update the SR. This is based on the Complaint Type and internal Service Level Agreements (SLAs) | | resolution_description | VARCHAR | YES | Describes the last action taken on the service request by the responding agency. May describe next or future steps. | | resolution_action_updated_date | TIMESTAMP | YES | Date when responding agency last updated the service request. | | bbl | VARCHAR | YES | Parcel number that identifies the location of the building or property associated with the service request. The block is a subset of a borough. The lot is a subset of a block unique within a borough and block. | | borough | VARCHAR | YES | The borough number is: 1. Manhattan (New York County) 2. Bronx (Bronx County) 3. Brooklyn (Kings County) 4. Queens (Queens County) 5. Staten Island (Richmond County) | | x_coordinate_state_plane | VARCHAR | YES | Geo validated, X coordinate of the incident location. X coordinate of the incident location. For more information about NY State Plane Coordinate Zones: https://data.gis.ny.gov/datasets/ny-state-plane-coordinate-system-zones/explore | | y_coordinate_state_plane | VARCHAR | YES | Geo validated, Y coordinate of the incident location. Y coordinate of the incident location. For more information about NY State Plane Coordinate Zones: https://data.gis.ny.gov/datasets/ny-state-plane-coordinate-system-zones/explore | | open_data_channel_type | VARCHAR | YES | Indicates how the service request was submitted to 311: Phone, Online, Other (submitted by other agency) | | park_facility_name | VARCHAR | YES | If the incident location is a Parks Dept facility and service requests pertains to a facility managed by NYC Parks (DPR), the name of the facility will appear here | | park_borough | VARCHAR | YES | The borough of incident if the service request is pertaining to a NYC Parks Dept facility (DPR) | | vehicle_type | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commission (TLC): Ambulette / Paratransit; Car Service; Commuter Van; Green Taxi | | taxi_company_borough | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commission (TLC). | | taxi_pick_up_location | VARCHAR | YES | If the incident pertains a vehicle managed by the Taxi and Limousine Commission (TLC), this field displays the taxi pick up location | | bridge_highway_name | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the name will be displayed here | | bridge_highway_direction | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the direction where the issue took place would be displayed here. | | road_ramp | VARCHAR | YES | If the incident location was Bridge/Highway this column differentiates if the issue was on the Road or the Ramp. | | bridge_highway_segment | VARCHAR | YES | Additional information on the section of the Bridge/Highway were the incident took place. | | latitude | DOUBLE | YES | Geo based Latitude of the incident location in decimal degrees | | longitude | DOUBLE | YES | Geo based Longitude of the incident location in decimal degrees | | community_board | VARCHAR | YES | Community boards are local representative bodies. There are 59 community boards throughout the City. For more information on Community Boards: [NYC government website](https://www.nyc.gov/site/cau/community-boards/community-boards.page) | --- Source: https://motherduck.com/docs/getting-started/interfaces/postgres-endpoint # Postgres endpoint (thin client) > Query MotherDuck from any Postgres-compatible client without installing DuckDB MotherDuck's Postgres endpoint lets you query your databases using any client that speaks the PostgreSQL wire protocol — no DuckDB installation required. This is ideal for serverless environments, BI tools, or languages without a DuckDB SDK. ## Quick start with psql Set your access token and connect: ```bash export MOTHERDUCK_TOKEN="your_token_here" PGPASSWORD=$MOTHERDUCK_TOKEN psql \ -h pg.us-east-1-aws.motherduck.com \ -p 5432 \ -U postgres \ "dbname=sample_data sslmode=verify-full sslrootcert=system" ``` Run a query: ```sql SELECT title, score FROM sample_data.hn.hacker_news WHERE type = 'story' ORDER BY score DESC LIMIT 5; ``` ## Quick start with Python ```python # /// script # dependencies = ["psycopg"] # /// import psycopg, os conn = psycopg.connect( host="pg.us-east-1-aws.motherduck.com", port=5432, dbname="sample_data", user="postgres", password=os.environ["MOTHERDUCK_TOKEN"], sslmode="verify-full", sslrootcert="system", ) with conn.cursor() as cur: cur.execute("SELECT title, score FROM sample_data.hn.hacker_news WHERE type='story' LIMIT 5") for row in cur: print(row) conn.close() ``` ## Key things to know - You're writing **DuckDB SQL**, not PostgreSQL SQL. Queries and MotherDuck SQL that run entirely inside MotherDuck generally work, but the Postgres endpoint is not a full DuckDB client. - Commands that depend on **local files, local attachments, or extension management** are not supported over the Postgres endpoint. - The Postgres endpoint is best for query execution, DDL and DML on MotherDuck tables, metadata inspection, and server-side reads from remote storage. - Features that depend on DuckDB client session state, such as temporary tables or result creation, require a DuckDB client path instead. - Always connect with **SSL enabled** (`sslmode=verify-full` recommended). - Use your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) as the password. ## Next steps - [Postgres Endpoint reference](/sql-reference/postgres-endpoint) — connection parameters, SSL options, session options, and known limitations - [Connect from Python](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/python) — psycopg2 and psycopg3 setup - [Connect from Java](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/java) — PostgreSQL JDBC driver setup - [Connect from Node.js](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/nodejs) — node-postgres setup - [Connect from Cloudflare Workers](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/cloudflare-workers) — serverless edge deployment --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/query-data # Query data > Execute SQL queries against MotherDuck using Python with hybrid local and cloud execution. For more information about database manipulation, see [MotherDuck SQL reference](/docs/sql-reference/motherduck-sql-reference/). MotherDuck uses DuckDB under the hood, so nearly all [DuckDB SQL](https://duckdb.org/docs/) works in MotherDuck without differences. MotherDuck leverages "hybrid execution" to decide the best location to execute queries, including across multiple locations. For example, if your data lives on your laptop, MotherDuck executes queries against that data on your laptop. Similarly, if you are joining data on your laptop to data on Amazon S3, MotherDuck executes each part of the query where data lives before bringing it together to be joined locally. ## Querying data In MotherDuck You can query data loaded into MotherDuck the same way you query data in your DuckDB databases. MotherDuck executes these queries using resources in the cloud. ```sql # table table_name is in MotherDuck storage con.sql("SELECT * FROM table_name").show(); ``` ## Querying data on your machine You can use MotherDuck to query files on your local machine. These queries execute using your machine's resources. ```sql # query a Parquet file on your local machine con.sql("SELECT * FROM '~/file.parquet'").show(); # query a table in a local DuckDB database con.sql("SELECT * FROM local_table").show(); ``` ## Joining data across multiple locations You can use MotherDuck to join data: - In MotherDuck - On S3 or other cloud object stores (Azure, GCS, R2, etc) - On your local machine ## What's next ? Ready to share your DuckDB data with your colleagues? Read up on [Sharing In MotherDuck](/key-tasks/sharing-data/sharing-data.mdx). --- Source: https://motherduck.com/docs/getting-started/mcp-getting-started # Talk to Your Data with AI > Get started with the MotherDuck MCP Server to analyze your data using natural language with Claude, ChatGPT, and other AI assistants The MotherDuck **remote** MCP Server lets you analyze your data using natural language and generate interactive visualizations, all without writing SQL. Connect your favorite AI assistant (Claude, ChatGPT, Cursor, or others) and start asking questions about your databases, then turn insights into shareable [Dives](/key-tasks/ai-and-motherduck/dives) with a single prompt. :::info Connection URL The remote MCP server is hosted at `https://api.motherduck.com/mcp`. Claude Desktop's connector uses this URL automatically; for clients that need manual configuration, see the [setup guide](/key-tasks/ai-and-motherduck/mcp-setup/). ::: :::note This guide covers the **remote MCP server** (fully managed by MotherDuck). If you need to work with local DuckDB files or want full control over the server, see the [local MCP server](/key-tasks/ai-and-motherduck/mcp-setup/#remote-vs-local-mcp-server). ::: In this guide, you'll connect the MCP server in Claude Desktop, query your data, and create a Dive visualization, all in under 5 minutes. ## What you'll learn - Connect the MotherDuck MCP Server to Claude Desktop - List your databases - Ask analytical questions about your data - Create an interactive Dive visualization from your analysis ## Prerequisites - A MotherDuck account ([sign up free](https://app.motherduck.com/)) - Claude Desktop installed ([download](https://claude.ai/download)) :::tip Using a different AI client? This guide uses Claude Desktop, but the remote MCP Server works with ChatGPT, Cursor, Claude Code, and other MCP-compatible clients. See the [full setup guide](/key-tasks/ai-and-motherduck/mcp-setup/) for instructions for your preferred client. ::: ## Step 1: Add the MCP server to Claude Desktop Open Claude Desktop settings and add the MotherDuck remote MCP Server: 1. Open **Claude Desktop** → **Settings** → **Connectors** 2. Click **Browse Connectors** and search for "MotherDuck" 3. Click **Add** to install the MotherDuck connector 4. A browser window opens for authentication with your MotherDuck account ## Step 2: Verify the connection and permissions After adding the connector, confirm Claude has access to the MotherDuck tools: 1. Open **Claude Desktop** → **Settings** → **Connectors** 2. Select **MotherDuck** and click on **Configure** You should see tools like `query`, `list_databases`, and `ask_docs_question` available. You can configure tool permissions to control how Claude uses each tool. See [Configuring tool permissions](/key-tasks/ai-and-motherduck/mcp-setup/#configuring-tool-permissions) for details. ## Step 3: List your databases Test the connection by asking Claude to list your databases: **Try this prompt:** ```text List all my databases on MotherDuck. ``` Claude will use the MCP tools to connect to MotherDuck and return your database list. ## Step 4: Analyze your data Now let's run an actual analysis. If you don't have data yet, you can attach the sample Hacker News database: **Attach the sample database:** ```text Attach this db 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' give me some analytics. ``` The `hacker_news` database contains Hacker News stories, comments, and metadata from 2016 to 2025. You'll see that even with a minimal prompt, you get great results for a first data exploration. For more tips on effective prompting and workflow patterns, check out the [MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows/).
:::info Sample databases The `hacker_news` database is one of several sample datasets available. See [Sample Data & Queries](/getting-started/sample-data-queries/datasets) for more datasets to explore. ::: ## Step 5: Create visualizations with Dives Now that you've explored your data, turn your insights into a persistent, interactive visualization. [Dives](/key-tasks/ai-and-motherduck/dives) are shareable visualizations that live in your MotherDuck workspace and stay up to date with your data. **Try this prompt:** ```text Create a Dive based on these insights. ``` Claude renders the Dive inline in the conversation with the Dive Viewer MCP App, using the same components as the MotherDuck UI and running against live data. Iterate conversationally: *"add a filter for the last 30 days"*, *"switch to a bar chart"*. Each edit saves as a separate version. ```text Save it to MotherDuck. ``` The Dive is saved to your workspace. You can open it in the MotherDuck UI, share it with your team, and it will always query live data. ## Next steps You're now ready to analyze your data and create visualizations with AI. Here are some ways to go deeper: - **[MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows/)**: Best practices and workflow patterns, including [how it works under the hood](/key-tasks/ai-and-motherduck/mcp-workflows/#how-it-works) - **[Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/)**: Go deeper into Dives by iterating on visualizations, sharing with your team, and managing version history - **[Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/)**: Setup instructions for ChatGPT, Cursor, Claude Code, and other clients - **[MCP Server Reference](/sql-reference/mcp/)**: Server capabilities, available tools, and regional availability - **[Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/)**: Build custom AI agents that programmatically query your data --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/odbc # ODBC > MotherDuck + ODBC The MotherDuck integration with ODBC is no different than DuckDB. For more information, see [ODBC](https://duckdb.org/docs/stable/clients/odbc/overview.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/pypi # PyPi Data > Want to know how users find and install software you've developed for the Python Community? This DuckDB and MotherDuck database allows you to use SQL to perform data analysis on PyPi data. ## Explore the data Interactive dashboard built on the DuckDB PyPI download stats. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset PyPi is the Python Package Index, a repository of software packages for the Python programming language. It is a central repository that allows users to find and install software developed and shared by the Python community. The dataset includes information about packages, releases, and downloads on the `duckdb` python package. It's refreshed **weekly** and you can visit the [DuckDB Stats dashboard](https://duckdbstats.com). ## How to query the dataset A dedicated shared database is maintained to query the dataset. :::note `aws-us-east-1` region only This database is only available for accounts in the `aws-us-east-1` region. ::: To attach it to your workspace, you can use the following command: ```sql ATTACH 'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410' AS duckdb_stats; ``` ## Example queries The following queries assume that the current database connected is `duckdb_stats`. Run `use duckdb_stats` to switch to it. ### Get weekly download stats ```sql SELECT DATE_TRUNC('week', download_date) AS week_start_date, version, country_code, python_version, SUM(daily_download_sum) AS weekly_download_sum FROM duckdb_stats.main.pypi_daily_stats GROUP BY ALL ORDER BY week_start_date ``` ## Schema ### pypi_file_downloads This table contains the raw data. Each row represents a download from PyPi. | column_name | column_type | null | |--------------|----------------------------------------------------------------------------------------------------------------|------| | timestamp | TIMESTAMP | YES | | country_code | VARCHAR | YES | | url | VARCHAR | YES | | project | VARCHAR | YES | | file | STRUCT(filename VARCHAR, project VARCHAR, "version" VARCHAR, "type" VARCHAR) | YES | | details | STRUCT("installer" STRUCT("name" VARCHAR, "version" VARCHAR), "python" VARCHAR, "implementation" STRUCT("name" VARCHAR, "version" VARCHAR), "distro" STRUCT("name" VARCHAR, "version" VARCHAR, "id" VARCHAR, "libc" STRUCT("lib" VARCHAR, "version" VARCHAR)), "system" STRUCT("name" VARCHAR, "release" VARCHAR), "cpu" VARCHAR, "openssl_version" VARCHAR, "setuptools_version" VARCHAR, "rustc_version" VARCHAR, "ci" BOOLEAN) | YES | | tls_protocol | VARCHAR | YES | | tls_cipher | VARCHAR | YES | ### pypi_daily_stats This table is a daily aggregation of the raw data. It contains the following columns: | column_name | column_type | null | |-------------------|-------------|------| | load_id | VARCHAR | YES | | download_date | DATE | YES | | system_name | VARCHAR | YES | | system_release | VARCHAR | YES | | version | VARCHAR | YES | | project | VARCHAR | YES | | country_code | VARCHAR | YES | | cpu | VARCHAR | YES | | python_version | VARCHAR | YES | | daily_download_sum| BIGINT | YES | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/stackoverflow # StackOverflow Data > Sample data from StackOverflow to use with DuckDB and MotherDuck to understand SQL-based data analytics. ## Explore the data Interactive dashboard built on the full Stack Overflow archive. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset [Stack Overflow](https://stackoverflow.com/) is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions. The dataset includes a complete **data dump up to May 2023**, covering posts, comments, users, badges, and related metrics. You can read more about the dataset in our blog series [part 1](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-1/) and [part 2](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-2/). ## How to query the dataset As this dataset is quite large, it's not part of the `sample_data` database. Instead, you can find it as a dedicated shared database. :::note `aws-us-east-1` region only This database is only available for accounts in the `aws-us-east-1` region. ::: To attach it to your workspace, you can use the following command: ```sql ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow; ``` ## Example queries The following queries assume that the current database connected is `stackoverflow`. Run `use stackoverflow` to switch to it. ### List the top 5 posts that received the most votes ```sql SELECT posts.Title, COUNT(votes.Id) AS VoteCount FROM posts JOIN votes ON posts.Id = votes.PostId GROUP BY posts.Title ORDER BY VoteCount DESC LIMIT 5; ``` ### Find the top 5 posts with the highest view count: ```sql SELECT Title, ViewCount FROM posts ORDER BY ViewCount DESC LIMIT 5; ``` ## Schema ### Badges | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | UserId | BIGINT | YES | | | | | Name | VARCHAR | YES | | | | | Date | TIMESTAMP | YES | | | | | Class | BIGINT | YES | | | | | TagBased | BOOLEAN | YES | | | | ### Comments | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostId | BIGINT | YES | | | | | Score | BIGINT | YES | | | | | Text | VARCHAR | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | UserId | BIGINT | YES | | | | | ContentLicense | VARCHAR | YES | | | | ### Post links | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | PostId | BIGINT | YES | | | | | RelatedPostId | BIGINT | YES | | | | | LinkTypeId | BIGINT | YES | | | | ### Posts | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostTypeId | BIGINT | YES | | | | | AcceptedAnswerId | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | Score | BIGINT | YES | | | | | ViewCount | BIGINT | YES | | | | | Body | VARCHAR | YES | | | | | OwnerUserId | BIGINT | YES | | | | | LastEditorUserId | BIGINT | YES | | | | | LastEditorDisplayName | VARCHAR | YES | | | | | LastEditDate | TIMESTAMP | YES | | | | | LastActivityDate | TIMESTAMP | YES | | | | | Title | VARCHAR | YES | | | | | Tags | VARCHAR | YES | | | | | AnswerCount | BIGINT | YES | | | | | CommentCount | BIGINT | YES | | | | | FavoriteCount | BIGINT | YES | | | | | CommunityOwnedDate | TIMESTAMP | YES | | | | | ContentLicense | VARCHAR | YES | | | | ### Tags | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | TagName | VARCHAR | YES | | | | | Count | BIGINT | YES | | | | | ExcerptPostId | BIGINT | YES | | | | | WikiPostId | BIGINT | YES | | | | ### Votes | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostId | BIGINT | YES | | | | | VoteTypeId | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | ### Users | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | Reputation | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | DisplayName | VARCHAR | YES | | | | | LastAccessDate | TIMESTAMP | YES | | | | | AboutMe | VARCHAR | YES | | | | | Views | BIGINT | YES | | | | | UpVotes | BIGINT | YES | | | | | DownVotes | BIGINT | YES | | | | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/stackoverflow-survey # StackOverflow Survey Data > Data from the StackOverflow Developer Survey from 2017 to 2024. ## Explore the data Interactive dashboard built on the survey data. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset Each year, [Stack Overflow conducts a survey](https://survey.stackoverflow.co/) of developers to understand the trends in the developer community. The survey covers a wide range of topics, including programming languages, frameworks, databases, and platforms, as well as developer demographics, education, and career satisfaction. Starting from 2017, StackOverflow provided consistent schema and data format for the survey data, making it a great dataset to analyze trends in the developer community over the years. The source is data are a series of CSV files that has been merged into a single schema with two tables for easy querying. ## How to query the dataset This dataset is available as part of the `sample_data` database, which is automatically attached to every MotherDuck account. ## Example queries ### List the most popular programming languages in 2024 ```sql SELECT language, COUNT(*) AS count FROM ( SELECT UNNEST(STRING_SPLIT(LanguageHaveWorkedWith, ';')) AS language FROM sample_data.stackoverflow_survey.survey_results where year='2024' ) AS languages GROUP BY language ORDER BY count DESC; ``` ### Top 10 countries with the most respondents in 2024 ```sql SELECT Country, COUNT(*) AS Respondents FROM sample_data.stackoverflow_survey.survey_results WHERE year = '2024' GROUP BY Country ORDER BY Respondents DESC LIMIT 10; ``` ### Correlation between remote work and job satisfaction in 2024 ```sql SELECT RemoteWork, AVG(CAST(JobSat AS DOUBLE)) AS AvgJobSatisfaction, COUNT(*) AS RespondentCount FROM sample_data.stackoverflow_survey.survey_results WHERE JobSat NOT IN ('NA', 'Slightly satisfied', 'Neither satisfied nor dissatisfied', 'Very dissatisfied', 'Very satisfied', 'Slightly dissatisfied') AND RemoteWork NOT IN ('NA') AND YEAR='2024' GROUP BY ALL ``` ## Schema ### stackoverflow_survey.survey_results This table contains all the survey results from 2017 to 2024. Each column represents a question from the survey. As questions change from year to year, the columns may vary a bit and the table is quite large. ### stackoverflow_survey.survey_schema This table contains the schema of the survey results. `qname` is the name of the question, which is also the column name in the `survey_results` table. `question` is the full question text. | Column Name | Column Type | |---------------|-------------| | qname | VARCHAR | | question | VARCHAR | | qid | VARCHAR | | force_resp | VARCHAR | | type | VARCHAR | | selector | VARCHAR | | year | VARCHAR | --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/r # R > MotherDuck + R The MotherDuck integration with R is no different than DuckDB. For more information, see [R](https://duckdb.org/docs/stable/clients/r.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/rust # Rust > MotherDuck + Rust The MotherDuck integration with Rust is no different than DuckDB. For more information, see [Rust](https://duckdb.org/docs/stable/clients/rust.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/wasm # WebAssembly (Wasm) > MotherDuck + WebAssembly The MotherDuck offers its own fork of DuckDB Wasm, which is [documented here](/sql-reference/wasm-client/). For more information about DuckDB Wasm, see [WebAssembly](https://duckdb.org/docs/stable/clients/wasm/overview.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/index # Client APIs > Client APIs for MotherDuck MotherDuck works with all DuckDB client APIs. Choose your preferred language or driver below. ## Included pages - [Python](https://motherduck.com/docs/getting-started/interfaces/client-apis/python): Connect and query MotherDuck from Python - [Other Client APIs](https://motherduck.com/docs/getting-started/interfaces/client-apis/other): Other DuckDB client APIs that work with MotherDuck --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/datasets # Example Datasets > A collections of open datasets and queries to get you started with DuckDB and MotherDuck We have prepared a series of datasets for you to [dive](/key-tasks/ai-and-motherduck/dives/) into MotherDuck! ## sample_data The `sample_data` database is automatically attached to every MotherDuck account regardless of your region. You can start querying the following tables right away: | `schema.table` | Description | |--------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| | [`who.ambient_air_quality`](air-quality.md) | Historical air quality data from the World Health Organization. | | [`nyc.taxi`](nyc-311-data.md) | Taxi ride data from November 2020 | | [`nyc.rideshare`](nyc-311-data.md) | Ride share trips (Lyft, Uber etc) in NYC | | [`nyc.service_requests`](nyc-311-data.md) | Requests to NYC's 311 complaint hotline through phone and web | | [`hn.hacker_news`](hacker-news.md) | Sample of comments from [Hacker News](https://news.ycombinator.com/) | | [`kaggle.movies`](kaggle-movies.md) | Movie titles and overviews with pre-computed embeddings from [Kaggle](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset) | | [`stackoverflow_survey.survey_results`](stackoverflow-survey.md) | Survey results from 2017 to 2024 | | [`stackoverflow_survey.survey_schemas`](stackoverflow-survey.md) | Survey schemas (questions from the survey) from 2017 to 2024 | ## Additional datasets The following datasets are available as separate shared databases. See each dataset's page for instructions on how to attach them. :::note `aws-us-east-1` region only These additional databases are only available for accounts in the `aws-us-east-1` region. ::: | Dataset | Description | |--------------------------------------------|---------------------------------------------------------------------------------------| | [StackOverflow](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | | [PyPi / DuckDB Stats](pypi.md) | Python package download data for the `duckdb` package, refreshed weekly | | [Hacker News (full)](hacker-news.md) | Full [Hacker News](https://news.ycombinator.com/) dataset from 2016 to 2025 | | [Foursquare](foursquare.md) | Global dataset of over 100 million points of interest (POIs) with location and business information | ## FAQ ### How do I re-attach the sample_data database? The `sample_data` database is attached automatically, but if you have accidentally removed it, you can re-attach it with: ```sql ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data; ``` --- Source: https://motherduck.com/docs/getting-started/getting-started # MotherDuck documentation > Getting started with MotherDuck serverless cloud data warehouse. MotherDuck is a serverless cloud data warehouse built on DuckDB. It's designed for fast, interactive SQL analytics without the infrastructure overhead. Build a modern data warehouse for BI, or power customer-facing analytics in your app. Develop and iterate locally, share and scale in the cloud when you need it. --- Source: https://motherduck.com/docs/getting-started/interfaces/interfaces # MotherDuck Interfaces > MotherDuck Offers a variety of interfaces (APIs) for integration ## Client Interfaces ## Included pages - [Client APIs](https://motherduck.com/docs/getting-started/interfaces/client-apis): Client APIs for MotherDuck - [Install and connect with the DuckDB CLI](https://motherduck.com/docs/getting-started/interfaces/connect-query-from-duckdb-cli): Learn to connect and query databases using MotherDuck from the DuckDB CLI - [MotherDuck Web UI](https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour): A guide to the MotherDuck Web UI — write SQL with Instant SQL, use AI to fix and edit queries, and explore your data interactively. - [Postgres endpoint (thin client)](https://motherduck.com/docs/getting-started/interfaces/postgres-endpoint): Query MotherDuck from any Postgres-compatible client without installing DuckDB --- Source: https://motherduck.com/docs/getting-started/e2e-tutorial/e2e-tutorial # MotherDuck tutorial > Complete end-to-end tutorial to get started with MotherDuck and DuckDB This comprehensive guide will take you from your first query to sharing databases with your team. ## What you'll learn This tutorial is in 3 parts, you'll discover how to: - 🔍 **[1. Query shared data](./part-1)** - Run your first SQL queries on publicly available datasets - 📊 **[2. Load your own data](./part-2)** - Upload and work with your own data from files and datasets - 🤝 **[3. Share databases](./part-3)** - Collaborate by sharing databases with team members :::tip Each part of this tutorial builds on the previous one, but you can also jump to specific sections if you're looking to learn particular features. ::: ## Prerequisites To follow this tutorial, you'll need: - A **MotherDuck account** ([sign up for free](https://app.motherduck.com/)) - Basic **SQL knowledge** (we'll guide you through the queries) - You have several ways to run the queries: * Execute them directly on this documentation website 🪄 * Use the [MotherDuck UI](https://app.motherduck.com) for the full interface experience * Connect with any [DuckDB client](../interfaces/)(Python, Java, DuckDB CLI) of your choice **⏱️ Estimated time:** 20-30 minutes for the complete tutorial Let's get started! 🚀 --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/index # Other Client APIs > Other DuckDB client APIs that work with MotherDuck DuckDB supports various client APIs that work seamlessly with MotherDuck. For the complete list of client APIs, see the [DuckDB Documentation](https://duckdb.org/docs/stable/clients/overview.html). ## Included pages - [C](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/c): MotherDuck + C - [Go](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/go): MotherDuck + GoLang - [Java (JDBC)](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/java): MotherDuck + Java - [Node.js](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/nodejs): MotherDuck + Node.js - [ODBC](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/odbc): MotherDuck + ODBC - [R](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/r): MotherDuck + R - [Rust](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/rust): MotherDuck + Rust - [WebAssembly (Wasm)](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/wasm): MotherDuck + WebAssembly --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/index # Python > Connect and query MotherDuck from Python Learn how to connect to MotherDuck and query your data using Python. ## Included pages - [DuckDB Python installation and authentication](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/installation-authentication): How to install DuckDB and connect to MotherDuck - [Specify MotherDuck database](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/choose-database): Specify MotherDuck database - [Loading data into MotherDuck with Python](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/loading-data-into-md): Load CSV, Parquet, and JSON files into MotherDuck from local, S3, or HTTPS sources using Python. - [Query data](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/query-data): Execute SQL queries against MotherDuck using Python with hybrid local and cloud execution.