--- title: About MotherDuck sidebar_class_name: about-motherduck-icon description: About MotherDuck --- import DocCardList from '@theme/DocCardList'; --- --- title: MotherDuck Billing description: Learn more about MotherDuck's pricing model and how to manage billing. --- import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; MotherDuck offers free and paid [billing](https://motherduck.com/pricing/) plans. View your Organization's incurred usage, track spend, and view your invoices. All new users start on a 21-Day Free Trial. import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 3 title: Instance Types description: Learn about instance types. --- MotherDuck implements a distinct tenancy architecture that diverges from traditional database systems. The platform utilizes a per-user tenancy model, which provisions isolated read-write instances for each Organization member. This architecture ensures dedicated compute resources and instance-level configuration at the individual user level, eliminating resource contention commonly found in shared compute environments. Users can independently optimize performance parameters according to their specific workload requirements. ## Instance Types ### PULSE **Optimized for ad-hoc analytics tasks and read-only workloads** - Running ad-hoc queries and analysis - Read-optimized workflows with high concurrent user access - Customer-facing analytics for data apps and embedded analytics ### STANDARD **Production-grade instance for analytical processing and reporting** - Core analytical workflows requiring balanced performance metrics - Development and validation environment for production workflows - Standard ETL/ELT pipeline implementation, such as: - Parallel execution of incremental ingestion jobs - Multi-threaded transformation processing ### JUMBO **Enterprise-scale instance for high-throughput processing** - Large-scale batch processing ingestion operations - Complex query execution on high-volume datasets - Advanced join operations and aggregations - RAM-Intensive processing of deeply-nested JSON structures --- --- sidebar_position: 2 title: Managing your bill description: Learn how to manage your MotherDuck spend. --- import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; ## Choosing Your Billing Plan During your Free Trial you may choose what happens once your Free Trial ends. You can make this election by navigating to the [Plans page](https://app.motherduck.com/settings/plans) in Settings in the MotherDuck UI: - If you select "Free", upon completion of the Free Trial, your organization will be under the [Free Plan](https://www.motherduck.com/pricing/) and its subsequent limitations. - If you select "Lite", upon completion of the Free Trial, your organization will be subject to the [Lite Plan](https://www.motherduck.com/pricing/). You will also be prompted to add payment information. - If you select "Business", upon completion of the Free Trial, your organization will be subject to the [Business Plan](https://www.motherduck.com/pricing/). You will also be prompted to add payment information. ## Monitoring Usage You may monitor your organization's Compute and Storage usage from the [Billing](https://app.motherduck.com/settings/billing) page. Compute usage is displayed in Compute Unit-hours. Storage usage is displayed in prorated Gigabyte-days. Gigabyte-days are calculated based on number of new bytes per day less any bytes that are not longer used after 7 days. As a minimal example, a 20GB database created on day 1 of a 30 month and deleted on day 2 will count for 0.67 GB days. ![Usage](img/billing.png) ## Viewing Your Invoice The "Billing" page also enables you to view your past invoices, as well as the current month's invoice thus far. - Lite & Business Plan users see their actual invoices. - [Free Trial](/about-motherduck/billing/pricing#free-trial) users see estimated invoices, fully discounted. - Invoices are not available for [Free Plan](/about-motherduck/billing/pricing#free-plan) users. Incurred Storage and Compute costs are broken down per-user, as well as aggregated. Note that invoices are only broken down per-user for organizations with 500 users or fewer. --- --- sidebar_position: 1 title: Understanding the pricing model description: Details of MotherDuck's pricing model. --- import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; ## MotherDuck Pricing Model MotherDuck is a serverless data warehouse. Naturally, we believe in hassle-free, fair, and efficient pricing. If you're a casual user, such as a student or a hobbyist, we hope that our Free Plan is enough to meet your needs. More details can be found in the [pricing table](https://motherduck.com/pricing/). - Paid users can use the Lite or Business Plan and incur Compute and Storage costs based on usage. - Alternatively, you can opt for the Free Plan to access a limited amount of Compute (10 CU hours / month) and Storage (10 GB). ### Compute Pricing **A Compute Unit (CU)** in MotherDuck is defined as a measure of CPU and memory usage over time. Depending on the Instance Type, MotherDuck meters compute on demand or per-instance. [Pulse](/about-motherduck/billing/instances/#pulse) instance is an on-demand, auto-scaling instance that is metered on a per-query basis. Pulse has a minimum of 1 Compute Unit (CU) second per query. [Standard](/about-motherduck/billing/instances/#standard) and [Jumbo](/about-motherduck/billing/instances/#jumbo) instances are metered per second an instance is running, with variable compute costs based on the pricing plan and selected Instance Type. Instances start quickly, usually within 200 milliseconds, and continue running for 60 seconds once the last query completes. Changing Instance Types can take up to two minutes to take effect. ### Storage Pricing MotherDuck charges you for storing data in its managed storage system, in GB-month, metered per-day (see note below). Under the hood, MotherDuck leverages DuckDB's compression algorithms to reduce the storage footprint and optimize for performance. MotherDuck provides data recoverability. In order to provide this functionality, data is kept around as a "fail-safe" for 7 days, and is billed during that period. [Shares](/key-tasks/sharing-data) do not incur additional data storage as it is a “zero-copy operation.” Similarly, using the [CREATE DATABASE X FROM DATABASE Y](/sql-reference/motherduck-sql-reference/create-database/) command is also a “zero-copy operation”, and only incremental changes made to the new database are added to storage. That is to say, only unique bytes are billed. As a reminder, MotherDuck does not charge you for: - Data managed by you in your own object storage bucket. - Data on your laptop. #### What changes can I make to optimize my storage bill? The right approach to optimize storage usage in MotherDuck varies by use case and implementation. MotherDuck support can share guidance on how to optimize your storage effectively. Please reach out to us at support@motherduck.com for additional guidance. :::note We meter storage in terms of calendar-days, defined as calendar-month * 12 / 365. This means that the same amount of storage will cost more in a 31-day month than in a 30-day month. ::: ### AI Function Pricing MotherDuck's AI functions are priced on a per-unit-consumed basis (AI Units) depending on which functionality is invoked: 1 AI Unit = $1.00. Fixit & SQL Assistance features (200 calls per day) are included in the free plan. Paid plans, provide expanded access to AI features on a per-unit-consumed basis: | Baseline AI Features | Price | # of Calls | |-|-|-| | FixIt | FREE | per call | | SQL Assistant Functions: Text-to-SQL, Explain SQL, etc. | 1 AI Unit | 60 calls | | Advanced AI Functions | Price | # of Tokens | |---|---|---| | Prompt - OpenAI GPT-4o-mini Input | 1 AI Unit | 2,000,000 tokens | | Prompt - OpenAI GPT-4o-mini Output | 1 AI Unit | 500,000 tokens | | Prompt - OpenAI GPT-4o Input | 1 AI Unit | 120,000 tokens | | Prompt - OpenAI GPT-4o Output | 1 AI Unit | 30,000 tokens | | Embedding - OpenAI text-embedding-3-small | 1 AI Unit | 15,000,000 tokens | | Embedding - OpenAI text-embedding-3-large | 1 AI Unit | 3,000,000 tokens | Advanced AI Functions have a default usage limit of 15 AI Units per day. This limit can be adjusted upon request to support@motherduck.com. ## Incentive Programs ### Free Trial You should not have to pay anything to see if MotherDuck fits your needs. To that end, when you sign up for MotherDuck and create an organization, you are granted a 21-day Free Trial. You are not required to enter a credit card. At any point during your Free Trial you may choose to setup billing and become a paid customer. You may also elect to become a Free Plan customer at the end of the Free Trial. ### Free Plan You may choose to become a Free Plan customer. This decision is made at the organization level. Free Plan customers are not required to set up billing. An organization on a Free Plan is allocated: - 10 Gigabytes of MotherDuck Storage per month. - 10 Compute Unit-hours of Compute per month. You can only have a maximum of 5 users in a single Free Plan organization. If the amount of data you keep in MotherDuck Storage exceeds the Free Plan limit, you lose the ability to query data in MotherDuck Storage. Only `DROP` and `DELETE` SQL commands are permitted until the overage is resolved. You may choose to resolve your Free Plan overage by upgrading to Paid Plan. You can do this in the MotherDuck Web UI by navigating to 'Settings' -> 'Plans'. ### Startup Program Qualifying startups get 50% off their annual contract on our Business Plan, in addition to the 21-day trial. No feature gating, and no hidden fees. Apply by filling out [this short form](https://motherduck.com/startups/#apply-now). --- --- sidebar_position: 13 title: Legal --- ## Product Terms of Service [MotherDuck Product Terms of Service](https://motherduck.com/terms-of-service/) [Products and Fees Addendum](https://motherduck.com/fees-addendum/) [Acceptable Use Policy](https://motherduck.com/acceptable-use-policy/) [Support Policy](https://motherduck.com/support-policy/) --- --- sidebar_position: 1 title: Release notes --- # Release Notes Welcome to our release notes, we’re excited to hear about your experience 😃 :::info 💁 If you have any issues please reach out directly to [Slack support channel](https://slack.motherduck.com/) or support@motherduck.com. ::: ## April 10, 2025 - MotherDuck supports DuckDB 1.2.2, a bugfix release. More details in the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.2.2). - We've updated MotherDuck's timezone handling to use `UTC` as the default, replacing the prior `America/New_York` default. When converting values to the "[Timestamp with Time Zone](https://duckdb.org/docs/stable/sql/data_types/timestamp.html#time-zones)" type, UTC will now be applied by default. A custom timezone for the active connection can be set temporarily using the `SET TimeZone = '';` command ([see available timezone values](https://duckdb.org/docs/stable/sql/data_types/timezones.html)). Your DuckDB client's local timezone will still be used for other time-related query operations. For more details on DuckDB's timezone handling, see the [DuckDB Time Zone documentation](https://duckdb.org/docs/stable/sql/data_types/timestamp.html#time-zone-support). - MotherDuck users can now specify an alias when [attaching a SHARE](https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview/). Refer to the [documentation](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/attach-database/#troubleshooting) for more information and reach out to us in our [Community Slack](https://slack.motherduck.com) if you have any questions or feedback. ## April 3, 2025 - **Access Control for Shares**: MotherDuck users can now create shares with a [RESTRICTED](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-share/#access-clause) access setting, allowing share owners to precisely control access by granting or revoking permissions for individual MotherDuck users or a list of specified users through [GRANT](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/grant-access/) and [REVOKE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/revoke-access/) commands. When first created, a [RESTRICTED](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-share/#access-clause) share is only accessible by the share owner. - **Manual Data Refresh for Read-Scaling Replicas**: MotherDuck users can now update data more frequently on [read-scaling replicas](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) by using the [CREATE SNAPSHOT OF](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-snapshot/) function to manually trigger snapshot creation, followed by [REFRESH DATABASE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/refresh-database/) on the read-scaling replica. This provides access to the freshest data without waiting for automatic updates. Note that manual snapshot creation will hold any new write queries on the read-write database from starting in order to able take the snapshot. ## March 20, 2025 - Users can now search & filter for notebooks, databases, and shares in the left sidebar with our object search in the top left navigation. - Introducing performance improvements to the databases section of the sidebar: The attached databases section now scales efficiently to handle very large numbers of databases, schemas, and tables. ## March 6, 2025 - MotherDuck now supports Indexes for query acceleration, in addition to their use in constraints. Learn more about DuckDB Indexes [here](https://duckdb.org/docs/stable/guides/performance/indexing.html#art-index-scans). - MotherDuck supports DuckDB 1.2.1, a bugfix release. More details in the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.2.1). - Support for DuckDB versions 0.10.2, 0.10.3, and 1.0.0 has ended. - Introducing a smoother local file experience: Persist files across sessions, view metadata directly in the Object Explorer, and convert files to tables. ## February 19, 2025 - Added [EXPLAIN ANALYZE](https://duckdb.org/docs/guides/meta/explain_analyze) support for profiling hybrid queries. - Added a "Running Queries" page in settings to monitor active long-running queries. ## February 11, 2025 With today’s release, we’re introducing a number of features to support businesses building production-grade analytics. See [blogpost](https://motherduck.com/blog/introducing-motherduck-for-business-analytics/) for more details. **New Plan Options:** MotherDuck now has two platform plans to choose from, **Lite** and **Business**, alongside our **Free** Plan. * The **Free Plan** is designed for hobbyists and experimenters with small-scale analytics needs, like hobby projects. * The **Lite Plan** is most useful for small team use cases and individuals. Maybe your small team is building out some early analytics, or your hobby project is growing into something more. * The **Business Plan** is ideal for businesses with complex needs, and larger teams. New Instance type options: **[New Instances](https://motherduck.com/docs/about-motherduck/billing/instances/) and compute pricing options:** _**Pay Per Instance**_: We’re adding new choices for MotherDuck compute, with Pay Per Instance **Standard** and **Jumbo** instances. * The _Pay Per Instance_ model is based on uptime, which provides more predictable costs you can compare to other data warehouse products. * The **Standard** instance is great for everyday tasks, and balanced performance. * The **Jumbo** instance is often useful for heavy workflows, like batch ETL pipelines or complex transformations. * When you run a query, your instance spins up within milliseconds. * You pay for the seconds that the instance is running, with a minimum of one minute. _**Pay Per Query**_: Our existing instances are now called **Pulse**. * These instances are capped in size, however they are billed on our existing _Pay Per Query_ model, metered for billing on Compute Unit seconds. * The **Pulse** instance enables lightweight, fully serverless analytics. * This can be very useful for applications where you have data partitioned by user, ad-hoc query execution, or incremental data processing with smaller data sizes. **Read Scaling Controls:** * Users with access to [Read Scaling](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling) in their organization can now set the Read Scaling replica pool size, letting you control the maximum concurrency threshold for your read replicas. * Users can set their Read Scaling [Instance type](https://motherduck.com/docs/about-motherduck/billing/instances/) indepdently of the Read/Write Instance type. ## February 6, 2025 MotherDuck supports DuckDB’s newly released version 1.2.0 🎉 DuckDB 1.2.0 is packed with improvements that make using MotherDuck even easier, like a better CSV reader, friendlier SQL, and improved performance! Read more about DuckDB 1.2.0 in the [MotherDuck Blog](https://motherduck.com/blog/announcing-duckdb-12-on-motherduck-cdw), and review the official [DuckDB Labs 1.2.0 announcement](https://duckdb.org/2025/02/05/announcing-duckdb-120.html) for notes on breaking changes and detailed updates. ## January 8, 2025 - MotherDuck clients now verify the server's TLS certificate. - MotherDuck now automatically opens the browser to facilitate authentication in Windows environments. ## December 12, 2024 - [Preview] Introducing MotherDuck’s REST API: Organizations with large numbers of users have struggled to manage them through the MotherDuck UI. We’ve received requests for a programmatic interface, and we’ve listened! We are launching a User Management REST API to provide support for managing Users and Access Tokens. Through the REST API, MotherDuck users can now easily create separate users for BI or data ingestion/processing workloads, and enable new experiences for app developers (ie. issuing temporary short-lived read-scaling tokens). See [the documentation](documentation/sql-reference/rest-api/motherduck-rest-api.info.mdx) for more information and reach out to us in our community Slack channel if you have any questions or feedback! ## December 4, 2024 - [Preview] Introducing support for read scaling: With the launch of read scaling tokens, MotherDuck accounts now support scaling up to 4 replicas of your database that can be read concurrently. When connecting with a read scaling token, each concurrent end user connects to a read scaling replica of the database that is served by its own duckling. See [our documentation](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for more information. - Auto-sync of new and deleted attachments: users who connect to MotherDuck through two different clients concurrently (example: the DuckDB CLI and the MotherDuck UI), will now see changes made by one client in another. For example, if you create a new database in the CLI, the MotherDuck UI will automatically be updated to reflect it and vice versa. Similarly, a new attachment, detaching, or database deletion will be synced. - Create databases directly from object explorer. Users can now create a new attached database from the Object explorer panel on the left side of the MotherDuck web UI. Previously you could only do so by issuing an SQL command. ## November 21, 2024 - Introducing the **Table Summary**. Customers have told us that they love the Column Explorer, but they wish there was an easy way to see it for tables in their database lists without having to write SQL. So we decided to build the table summary. You can activate it by clicking on a table or view in the Object Explorer, which will reveal a panel that shows the Column Explorer (the column names, types, distributions, and null percentages for the selected table or view). You can get a quick preview of the table preview and see the DDL statement that defines it. We’re excited to see how you use it! - **A resizable, responsive Column Explorer**. To make the table summary work well, we made the Column Explorer both resizable and responsive. This also means the inspector – the right side panel that expands and shows the Column Explorer for your result sets – can be resized. As the panel gets smaller, we responsively hide the null percentage and the distribution plots, giving more room for the column name. - Introducing the **[MD_INFORMATION_SCHEMA](documentation/sql-reference/motherduck-sql-reference/md_information_schema/introduction.md)**. The MotherDuck MD_INFORMATION_SCHEMA views are read-only, system-defined views that provide metadata information about your MotherDuck objects. The current views that you can query to retrieve metadata information are: databases, owned_shares, and shared_with_me. ## November 7, 2024 - MotherDuck now supports DuckDB 1.1.3 clients, a bugfix release. More info on the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.1.3). - DuckDB recently [introduced a change](https://github.com/duckdb/duckdb/pull/13372) that would allow for much more efficient concurrent bulk ingestion. We completed the necessary infrastructure changes, plus collaborated on [some bug fixes](https://github.com/duckdb/duckdb/pull/14467) and that optimization is now enabled on our backends. ## October 31, 2024 - Motherduck introduces `Admin` and `member` roles for organizations. `Admin` users can change the roles of other users in the organization or [Remove](documentation/key-tasks/managing-organizations/managing-organizations.mdx#removing-users) a user from the organization. - MotherDuck & Hydra announced the first release of [pg_duckdb](https://github.com/duckdb/pg_duckdb), a PostgreSQL extension that allows you to run DuckDB (and connect to MotherDuck!) within PostgreSQL. Read more about it [here](https://motherduck.com/blog/pgduckdb-beta-release-duckdb-postgres/) ## October 17, 2024 - MotherDuck now supports DuckDB 1.1.2 clients, a bugfix release. More info on the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.1.2). ## October 14, 2024 - Shares now support [auto-updating](documentation/sql-reference/motherduck-sql-reference/create-share.md). Automatically updated shares no longer require running explicit UPDATE SHARE commands. Instead changes on the underlying database are automatically published to the share within at most 5 minutes, after writes have completed. However, the option for manually updating shares remains available and continues to be the default setting. This allows users who prefer finer control over their update lifecycle to maintain their usual workflow. The auto-updating property is defined at share creation time, and share owners can force an explicit update any time on both types of shares by running [`UPDATE SHARE`](documentation/sql-reference/motherduck-sql-reference/update-share.md). ## October 9, 2024 We are excited to introduce a new SQL [prompt](/documentation/sql-reference/motherduck-sql-reference/ai-functions/prompt.md) function, currently in preview, that enables text generation directly within SQL queries. This feature leverages LLMs to process and generate text based on provided prompts. Features: * Generate SQL: Use the prompt function in your SQL queries to request text generation, for example, `SELECT prompt('Write a poem about ducks');`. * Model Selection: Specify the LLM model type with the model parameter. Available models include `gpt-4o-mini` (default) and `gpt-4o-2024-08-06`. * Structured Outputs: Opt for structured responses using the struct or json_schema parameters to tailor the output format to your needs. Checkout more snippets [here](/documentation/sql-reference/motherduck-sql-reference/ai-functions/prompt.md#text-generation). ## October 2, 2024 - MotherDuck now supports [monitoring](documentation/sql-reference/motherduck-sql-reference/connection-management/monitor-connections.md) and [interrupting](documentation/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections.md) server-side queries. - Various stability and usability improvements. ## September 25, 2024 - MotherDuck now supports DuckDB 1.1.1, a bugfix release. More info on the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.1.1). - In the MotherDuck Web UI, users can easily view and copy the contents of a cell from their query results. ## September 16, 2024 MotherDuck now supports DuckDB version 1.1.0. 🎉 This releases includes a number of new features and a lot of performance improvements. Here are some non-exhaustive key updates: **New features** - [SQL variables](https://duckdb.org/2024/09/09/announcing-duckdb-110#friendly-sql) - [Query an query_table functions](https://duckdb.org/2024/09/09/announcing-duckdb-110#query-and-query_table-functions) - [GeoParquet (Spatial extension features)](https://duckdb.org/2024/09/09/announcing-duckdb-110#spatial-features) **Performance improvements** - [Dynamic Filter Pushdown from Joins](https://duckdb.org/2024/09/09/announcing-duckdb-110#dynamic-filter-pushdown-from-joins) - [Automatic CTE Materialization](https://duckdb.org/2024/09/09/announcing-duckdb-110#automatic-cte-materialization) - [Parallel Streaming Queries](https://duckdb.org/2024/09/09/announcing-duckdb-110#automatic-cte-materialization) Read more on [DuckDB's 1.1.0 blog](https://duckdb.org/2024/09/09/announcing-duckdb-110.html). ## September 5, 2024 - New MotherDuck users are optionally guided through running and analyzing a query upon first logging in to the Web UI. ## August 21,2024 - MotherDuck now supports [Full Text Search - FTS extension](https://duckdb.org/docs/extensions/full_text_search.html). You can now create a text search index on tables in your MD databases and search them. (Note: Currently, the creation of the FTS index is not supported from MotherDuck-WASM client and app.motherduck.com, but all other clients do.) ## August 14, 2024 - MotherDuck now has an [embedding()](documentation/sql-reference/motherduck-sql-reference/ai-functions/embedding.md) function to compute `FLOAT[512]` text embeddings based on OpenAI's text-embedding-3-small model. Read more about it in our [announcement blog post](https://motherduck.com/blog/sql-embeddings-for-semantic-meaning-in-text-and-rag/)! - MotherDuck now supports [sequences](https://duckdb.org/docs/sql/statements/create_sequence.html), with one small limitation: Table column definitions that refer to a sequence by a fully qualified catalog name are rejected. Note that cross-catalog references are already disallowed by DuckDB. ## August 7, 2024 - MotherDuck now supports [foreign keys](https://duckdb.org/docs/sql/constraints.html#foreign-keys). Foreign keys define a column, or set of columns, that refer to a primary key or unique constraint from another table. The constraint enforces that the key exists in the other table. ## July 24, 2024 - In the MotherDuck Web UI, users can now drop, rename, and comment on tables/views and columns from the Object Explorer - Users can now see the logical size of their MotherDuck databases using `FROM pragma_database_size()` ## July 10, 2024 - **Access Tokens**: Users can now create multiple access tokens and revoke them as needed. Tokens can also be configured to expire after a set number of days. [Learn more](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). - **Organization domain invites**: Organizations can be configured such that that anyone with the organization's email domain automatically receives an invitation upon signing up. - **CREATE SHARE with conflict mode**: Database shares can be created with a conflict mode so if a share with the same name already exists, IF NOT EXISTS will not throw an error and OR REPLACE will replace it with a new share. ## June 26, 2024 - **Delta Lake support**: You can now query Delta Lake tables in MotherDuck. [Learn more](/integrations/file-formats/delta-lake). - In the MotherDuck Web UI, the Object Explorer interface (that catalogs shares and databases on the left side of the UI) has been revamped. - ACH has been added as a billing method, in addition to credit card billing. - Resolved an issue affecting large SQL queries in both the MotherDuck UI and the Wasm SDK. ## June 20, 2024 - New MotherDuck users are now treated to a "Welcome to MotherDuck!" notebook upon first logging on to the Web UI. - In the MotherDuck Web UI, the legacy notebook called "My Notebook" can now be renamed and/or deleted, and notebooks can now be closed. - In the MotherDuck Web UI, helpful links and drop-down menus have been improved. - MotherDuck now supports DuckDB's [Spatial Extension](https://duckdb.org/docs/extensions/spatial.html). This extension is pre-installed in MotherDuck, and users are not required to install this extension. Currently, the `GEOMETRY` type has a limitation in that it does not currently render in the MotherDuck Web UI. More details to come. ## June 13, 2024 - Free Plan compute usage limits are now being enforced. Queries for users on the Free Plan may be throttled. [Learn more](/about-motherduck/billing/pricing#free-plan) ## June 11, 2024 - MotherDuck is now Generally Available! ## June 6, 2024 - MotherDuck now supports [organization-scoped and discoverable shares](/key-tasks/sharing-data/sharing-overview). - MotherDuck now supports storing [Hugging Face type secrets](/sql-reference/motherduck-sql-reference/create-secret). ## June 3, 2024 - MotherDuck now supports DuckDB version 1.0.0. If you have upgraded to 0.10.2+, you can connect with clients that are either of version 0.10.2, 0.10.3, or 1.0.0. ## May 30, 2024 - MotherDuck now supports DuckDB version 0.10.3. If you have upgraded to 0.10.2+, you can connect with clients that are either of version 0.10.2 or 0.10.3. - Added support to read datasets directly from HuggingFace. Learn more about this new feature [here](https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb.html). - Added support for [ARRAY Type](https://duckdb.org/docs/sql/data_types/array.html#:~:text=Array%20Type%20%E2%80%93%20DuckDB&text=An%20ARRAY%20column%20stores%20fixed,ARRAY%20%2C%20LIST%20and%20STRUCT%20types.) in MotherDuck UI. - MotherDuck UI now supports multiple notebooks. - Fixed a bug in which running the `UPDATE SHARE` command would kill ongoing queries. ## May 15, 2024 - MotherDuck now supports DuckDB 0.10.2. All new MotherDuck users default to DuckDB version 0.10.2, and all existing users can now permanently migrate to DuckDB version 0.10.2. DuckDB version 0.10.2 features a large number of stability and performance improvements, and all users are encourage to migrate. - Starting with DuckDB 0.10.2, MotherDuck now supports multiple versions of DuckDB at once. For example, you could use DuckDB version 0.10.3 in the CLI and DuckDB version 1.0 in Python. - MotherDuck now supports [Multi-Statement Transactions](https://duckdb.org/docs/sql/statements/transactions.html). You must be on DuckDB version 0.10.2 or above. - MotherDuck now supports [Indexes](https://duckdb.org/docs/sql/indexes.html) for the purpose of constraints of types `UNIQUE` or `PRIMARY KEY`. For example, you can leverage `INSERT ON CONFLICT` to dedupe or upsert your data. [Learn more](https://duckdb.org/docs/sql/statements/insert#on-conflict-clause). Indexes are not yet being utilized in MotherDuck for query acceleration. - MotherDuck now supports Secrets syntax consistent with DuckDB 0.10 and above. [Learn more](/sql-reference/motherduck-sql-reference/create-secret). - [FixIt](/getting-started/motherduck-quick-tour#writing-sql-with-confidence-using-fixit) is now 2-3x faster. - Improved reliability of the service during releases. Moving forward, MotherDuck releases should not disrupt ongoing queries and workloads for users. ## May 8, 2024 - You can now preview DuckDB version 0.10.2 in MotherDuck. - You can now [choose your organization's pricing plan](/about-motherduck/billing/managing-billing#choosing-your-billing-plan) using the [Plans](https://app.motherduck.com/settings/plans) page in the Settings section of the MotherDuck Web UI. - You can now configure your organization's payment method in the [Billing](https://app.motherduck.com/settings/billing) page in the Settings section of the MotherDuck Web UI. Free Plan customers are not required to configure a payment method. ## May 1, 2024 - Fixed a bug, in which MotherDuck releases would kill running queries. Releases no longer disrupt ongoing queries and workloads. - A number of under the hood stability improvements. ## April 25, 2024 - Improved reliability of `ATTACH` operations. - Various reliability and polish improvements. ## April 24, 2024 - The MotherDuck [Wasm SDK](/key-tasks/data-apps/wasm-client) is now available for app developers. Read more about the SDK in the [blog annoucement](https://motherduck.com/blog/building-data-applications-with-motherduck/). ## April 17, 2024 - [Billing Portal](./billing/managing-billing.mdx) is now available in the MotherDuck Web UI. You can use the Billing Portal to view your organization's incurred usage and current and past invoices. - You can now invite your teammates to [Organizations](../key-tasks/managing-organizations/managing-organizations.mdx). Currently, Organizations are useful to group users together to monitor incurred usage in the Billing Portal, and additional capabilities will land in coming weeks. - Fixed an issue, in which MotherDuck releases would cancel running queries. ## April 10, 2024 - Catalog changes in one MotherDuck client will now automatically propagate to other clients. - MotherDuck now supports indexes on temporary tables. ## March 20, 2024 - Fixed an issue, in which users’ runtimes can become unresponsive. - In the MotherDuck UI, improved how row counts and query times are calculated. - A variety of additional bug fixes and infrastructure-level improvements. ## March 7, 2024 - Operations on all databases that create shares (using `CREATE SHARE`), create databases (using `CREATE DATABASE`), or update shares (using `UPDATE SHARE`) are now metadata-only and copy no data. ## February 29, 2024 - A variety of fixes and improvements across the product. ## February 22, 2024 - Numerous bug fixes and stability improvements across the entire product. ## February 14, 2024 - In the MotherDuck web UI, you can now visualize your tables and query results with the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/). - For any database created starting today, operations on these databases that create shares (using `CREATE SHARE`), create databases (using `CREATE DATABASE`), and update shares (using `UPDATE SHARE`) are metadata-only and copy no data. ## February 13, 2024 - You are no longer required to provide a share name when creating shares. In this case, the created share will be named the same as the source database. For example, executing `CREATE SHARE FROM mydb` would create a share named `mydb`; if your current share is `db`, then `CREATE SHARE` would create a share named `db`. See [`CREATE SHARE`](../sql-reference/motherduck-sql-reference/create-share.md) syntax. - In CLI or Python, MotherDuck no longer displays the authentication token by default. You can retrieve the authentication token by running [`PRAGMA PRINT_MD_TOKEN`](../sql-reference/motherduck-sql-reference/print-md-token.md). - Support for DuckDB version 0.9.1 has ended. ## January 04, 2024 New Features: - MotherDuck now supports [DuckDB macros](../sql-reference/duckdb-sql-reference/duckdb-statements/create-macro.md). - MotherDuck now supports [DuckDB ENUM data types](../sql-reference/duckdb-sql-reference/enum.md). - Fully qualified column names in SELECT clauses are now supported. For example: ```sql SELECT schema.table.column FROM schema.table ``` Updates and Fixes: - Fixed a bug, in which prepared statements for INSERT operations did not work. - In the MotherDuck web UI, data exports are now faster. - Rolled out major infrastructure improvements in hybrid query execution, resulting in faster and more reliable hybrid queries. ## January 03, 2024 - [FixIt](../key-tasks/writing-sql-with-ai.md) is now available in the MotherDuck web UI. FixIt helps you resolve common SQL errors by suggesting AI-generated fixes line-by-line. ## November 30, 2023 - In the MotherDuck web UI, you can now copy query results to the clipboard or export query results as CSV, TSV, Parquet, or JSON files. ![Export query results](./img/release-notes-1.15.0-export.png) - In the MotherDuck web UI, query error messages are now easier to read. ![Query error message](./img/release-notes-1.15.0-error-messages.png) ## November 15, 2023 - MotherDuck has been upgraded to DuckDB 0.9.2. You can use either DuckDB 0.9.1 or DuckDB 0.9.2, but not both, until December 6th. ## November 3rd, 2023 - You can now [query Iceberg tables](../integrations/file-formats/apache-iceberg.mdx) on object storage. - Improved stability of share attaches. - In the MotherDuck web UI, a new database selector now enables you to use a specific database for each notebook cell. ## October 25, 2023 - In the MotherDuck web UI, you can now move and reorder individual notebook cells. - In the MotherDuck web UI, the MotherDuck-specific SQL syntax is now highlighted. - In the MotherDuck web UI, column histograms are now opt-in on a per-result basis, rather than a global opt-out via Settings. - Improved how the MotherDuck web UI displays datetime data types, matching formatting in the CLI. - In the MotherDuck web UI, you can now easily copy-paste a rectangular selection of query results into Google Sheets or Excel. ## October 16, 2023 MotherDuck has been upgraded to DuckDB 0.9.1 :tada: Please see the migrations guide for more info! - You can now query Azure object storage. See [documentation](../integrations/cloud-storage/azure-blob-storage.mdx) for more info. - You can now easily load AWS credentials used locally into MotherDuck. Please see syntax for [`CREATE SECRET`](../sql-reference/motherduck-sql-reference/create-secret.md) for more info. - Better performance and reliability with lower memory usage. - More intelligent parsing of CSV files. ## September 21, 2023 - The MotherDuck web UI supports Attaching and Detaching databases and shows detached databases. - The MotherDuck web UI now loads significantly faster. This is an additional improvement over August 30, 2023. - When a user updates a shared database, all consumers automatically receive the update within 1 minute. - Support `CREATE OR REPLACE DATABASE` and `CREATE IF NOT EXISTS DATABASE`. - Fixed a bug in which queries with long commit times would result in the dreaded "`Invalid Error: RPC 'SETUP_PLAN_FRAGMENTS' failed: Deadline Exceeded (DEADLINE_EXCEEDED)`" error. - Performance and stability of uploads has been improved. - The MotherDuck web UI now displays decimals correctly. ## August 30, 2023 - The MotherDuck web UI now loads significantly faster. - The MotherDuck web UI now supports autocomplete. As you write SQL in the UI, on every keystroke autocomplete brings up query syntax suggestions. You can turn off autocomplete in Web UI settings, found under the gear icon in top right. - In the MotherDuck web UI, you can now execute multiple SQL statements in the same SQL cell. ## August 23, 2023 - Fixed a bug, in which large uploads and downloads would fail. - Improved performance of uploading data into MotherDuck from all supported sources. - Added [SHOW ALL DATABASES](../sql-reference/motherduck-sql-reference/show-databases.md) DDL command. This command enables you to list all database types, including MotherDuck databases, DuckDB databases, and databases that were created from shares. - In the MotherDuck web UI, you can now cancel queries. ![cancel query](./img/release0823_1.png) - In the MotherDuck web UI, you can now add files of type JSON and files with arbitrary postfixes. - In the MotherDuck web UI, under the ‘Help’ menu, you can now find the service specific Terms of Service. ## August 17, 2023 - Numerous stability and performance improvements across the entire product. - Added more descriptive error messages in a number of areas. - Better timestamp support in the MotherDuck UI. ## August 01, 2023 - You can now copy a MotherDuck database through [CREATE DATABASE](/sql-reference/motherduck-sql-reference/create-database) using `CREATE DATABASE cloud_db FROM another_cloud_db`. - Fixed a https certificate error that was appearing on Windows machine when downloading/loading the MotherDuck extension through the CLI. - Fixed a bug where [DESCRIBE SHARE](../sql-reference/motherduck-sql-reference/describe-share.md) was not returning the actual database name. ## July 26, 2023 - You can now use MotherDuck in CLI or Python with the Windows operating system. - LIST and DESCRIBE SHARES SQL commands now return the database name instead of the snapshot name. - Improved resilience of large uploads. - Added more descriptive error messages for DDL queries. ## July 21, 2023 - Added DDL for [`DESCRIBE SHARE`](/sql-reference/motherduck-sql-reference/describe-share) and [`UPDATE SHARE`](/sql-reference/motherduck-sql-reference/update-share). - Added DDL for [`CREATE [OR REPLACE] SECRET`](/sql-reference/motherduck-sql-reference/create-secret) and [`DROP SECRET`](/sql-reference/motherduck-sql-reference/delete-secret). - Added `RESTRICT` and `CASDADE` options to `DROP DATABASE` DDL. See [documentation](/sql-reference/motherduck-sql-reference/drop-database). - The current database, set with USE DATABASE, is now persisted across sessions in the web UI. - Data uploads and downloads have been accelerated by roughly 3x by compressing data over the wire. - Numerous stability and performance improvements across the entire product. - Added more descriptive error messages in a number of areas. ## June 29, 2023 - You can now use AI to help you write SQL with the `prompt_sql` function, answer questions about your data with the `prompt_query` pragma, describe your data with the `prompt_schema` pragma, and fix your SQL with the `prompt_fixup` function. See [documentation](/key-tasks/writing-sql-with-ai). ## June 27, 2023 - Added support for [`DROP SHARE [IF EXISTS]`](/sql-reference/motherduck-sql-reference/drop-share), [`LIST SHARES`](/sql-reference/motherduck-sql-reference/list-shares), and [`LIST SECRETS`](/sql-reference/motherduck-sql-reference/list-secrets) operations. Previously these operations were supported via table functions. The MotherDuck web UI now supports creating, deleting, and listing S3 secrets. - Numerous improvements to the MotherDuck web UI. - Fixed a bug, in which the share URL was not returning after running the `CREATE SHARE` command in the CLI. - Referencing database objects is now case insensitive. For example, if a database `DuCkS` exists, you can now reference it as `ducks` or `DUCKS`. When listing databases, you will see `DuCkS`. ## June 23, 2023 - Numerous fixes to improve the stability and reliability of our authentication process and token expiry. - In the MotherDuck web UI there is now a new drop-down menu on User Profile (upper right) with options to access settings, send an invite, and log out. - Added support for `IF EXISTS` option to the `DROP DATABASE` SQL command. See [documentation](/sql-reference/motherduck-sql-reference/drop-database). - Added support for allowing the `motherduck_token` parameter in the connection string. - Added md_list_secrets() table function. Because MotherDuck currently only supports a single secret, this function returns either `TRUE` or `FALSE` depending on whether a secret exists. See [documentation](/sql-reference/motherduck-sql-reference/list-secrets). - Fixed a bug in the MotherDuck web UI where tables were rendered incorrectly. ## June 21, 2023 - In the MotherDuck web UI, the interactive query results panel now supports all DuckDB data types. - Easier signup flow for new users. - Performance of loading data into MotherDuck has been improved. - Added support for `CREATE [OR REPLACE | IF NOT EXISTS] DATABASE` and `CREATE DATABASE FROM CURRENT_DATABASE()`. - A concurrency issue on dropping and recreating shares has been resolved. - Timeout handling for hybrid queries has been improved. - The MotherDuck connection parameter `deny_local_access` has been renamed to `saas_mode` and now sets both `enable_external_access=false` and `lock_configuration=true` DuckDB properties. In practice, this means that when connecting to MotherDuck with the `deny_local_access=true` parameter, users will _not_ be able to read/write local files, read/write local DuckDB databases, install/load any extensions or update any configuration. See [documentation](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-saas-mode). - Numerous other improvements. ## June 15, 2023 - MotherDuck now supports DuckDB [0.8.1](https://github.com/duckdb/duckdb/releases/tag/v0.8.1). Currently, MotherDuck only supports a single version of DuckDB at a time so you must upgrade your DuckDB instances to 0.8.1. - Performance of loading data into MotherDuck has been drastically improved. - Database name in SQL command `CREATE DATABASE` is now a literal. You need to leave the database name unquoted. For example: - Supported: `CREATE DATABASE ducks;` - Supported: `CREATE DATABASE “ducks”;` - No longer supported: `CREATE DATABASE ‘ducks’;` - You can now create a share using the `CREATE SHARE` statement, in addition to previously supported table function `md_create_database_share()`: - Supported: `CREATE SHARE myshare FROM ducks;` - Supported: `CALL md_create_database_share( ‘myshare’ , ‘ducks’);` - You can now write data to s3 using the `COPY TO` command. - In the web UI entering and exiting full screen mode has been simplified. You can also choose to only display the query editor or the query results using the overflow menu. - In the web UI you can now work with compound data types from json in interactive query results. - You can now use both lowercase and uppercase versions of the environment variable `motherduck_token` (e.g. `MOTHERDUCK_TOKEN`). ## June 7, 2023 - Views are now supported. - Query results in the web UI are now interactive. Powered by [Tad](https://www.tadviewer.com/) and DuckDB in WASM, you can now quickly sort, filter and pivot results of a SQL query. Click on column headers to sort, or the pivot icon to open the control surface. ![query results](./img/release0607_1.png) - Query results now include interactive column histograms for numeric columns. The gray background area of the column histogram is a brush that can be dragged to interactively filter results. ![query results 2](./img/release0607_2.png) - The Motherduck extension for CLI and Python auto-updates itself. Users no longer need to run ‘FORCE INSTALL motherduck’ to update their MotherDuck-powered DuckDB instances. Note: of course, to get this goodness, we ask you to run force install one last time. - Various stability and usability improvements. ## May 31st, 2023 **Summary** - SQL queries in the web UI are now automatically saved in local storage in your web browser and restored when you reload the page. - The MotherDuck extension is now available for Linux on ARM64! - Support [ON CONFLICT](https://duckdb.org/docs/sql/statements/insert.html#on-conflict-clause) clause. - New setting `deny_local_access` to lock down filesystem and extension loading (note: does not prevent DuckDB database access). ## May 24, 2023 **Summary** - Various stability improvements and bug fixes ## May 22, 2023 **Summary** - The MotherDuck service is upgraded to DuckDB 0.8.0 - Catalog schemas are now supported. - Querying `md_databases()` no longer returns snapshots. - Shares that you create are no longer auto-attached. As the creator, you can attach them via `attach ` - Various stability improvements and bug fixes **_Known issues_** - Some shares appear as "empty" databases. Please report to [support@motherduck.com](mailto:support@motherduck.com) if you spot a sharing issue. ## May 17, 2023 - The DuckDB ICU [extension](https://duckdb.org/docs/extensions/overview.html#all-available-extensions) is now enabled by default. This extension adds support for time zones and collations using the ICU library. - The web UI now displays your avatar instead of initials in the user menu - The first database alphabetically is now used for querying by default in web UI. CLI behavior has not changed – if you don't pass a specific database through the connection string, the default database _my_db_ will be used for querying. NOTE: this will change once we upgrade to the just-released DuckDB 0.8.0 - Output of query EXPLAIN is now more user-friendly - Various stability improvements and bugfixes ## May 5, 2023 - Fixed a bug, in which users were unable to supply the authentication token in-line in the connection string. For instance `.open md:?token=123123` or `duckdb md:?token=3333`. - DELETE and UPDATE table operations are now supported. ## May 3, 2023 - Stability of DML and DDL operations has been greatly improved - Hybrid query execution has now been upgraded to execute many query types more efficiently - You can now upload your current DuckDB database using the `CREATE DATABASE FROM 'CURRENT_DATABASE'` operation - In the web UI you can now find a link to MotherDuck’s technical documentation - In the web UI you can now upload files from your local computer to MotherDuck - In programmatic interfaces (JDBC, CLI, Python) you can now connect to a specific database using syntax `md:` or `motherduck:` - MotherDuck now creates a default database called `my_db` for you. This is the database you connect to if you do not specify a database when connecting to MotherDuck ## April 26, 2023 - You can now work with multiple databases - cloud or local. You can now query across multiple cloud or local databases - You can now save your S3 credentials in MotherDuck using the MD_CREATE_SECRET operation - You can now upload DuckDB databases to MotherDuck using the CREATE DATABASE FROM operation - MotherDuck UI now has improved notebook experience ## April 19, 2023 - Various stability, performance, and UI improvements ## April 12, 2023 - The JSON extension to DuckDB is now pre-installed automatically in the web UI. - The table viewer component in the Web UI is now a simple table (rather than an interactive pivot table). This should greatly improve time to first render on query results, especially for small queries. We plan to re-enable the pivot table in an upcoming release, once some underlying performance issues are resolved. - The duck feet are paddling very hard underwater (numerous stability and performance improvements). ## March 30, 2023 - Fixed: [auto_detection of schema of .csv fails in WASM](https://lindie.app/share/92ac65cc6e006bff2fb60417388294965ef2d4c7) - Fixed: intermittent "Error reading catalog: Cancelling all calls" error - Numerous stability and performance improvements ## March 22, 2023 - CLI uses the same database by default as the web app (first sorted alphabetically) - Multiple improvements in the MotherDuck UI - Numerous stability and performance improvements - Enabled query EXPLAIN for queries that execute in hybrid mode ## March 8, 2023 - Numerous stability and performance improvements - Vastly improved performance of loading multiple CSVs in the same command - Fixed a bug in CLI, in which authentication via browser would fail # March 1, 2023 > Even more goodies! - Delivered major improvements to hybrid execution, resulting in better efficiency, stability, and performance - Fixed a bug in UI, in which dropping and creating a database with the same name displayed incorrect information - Migrated to DuckDB 0.7.1 - Fixed an error message when running MotherDuck commands in the CLI without running .open # January 26, 2023 > We’re back with more exciting improvements! - Addressed server timeouts associated with long-running queries. Still triaging other potential issues with long running issues but network tier issues should be mitigated to a large degree. - Empty databases now appear in the catalog in UI - Added an MD_VERSION Pragma function - Implemented Oauth sign-in flow from native client - Upgraded MotherDuck-hosted DuckDB to version 0.6.1 - Fixed a number of bugs across the entire service # December 23, 2022 > Our first release! Duckies first steps 🦆 --- --- sidebar_position: 1 title: Architecture and capabilities --- import Image from '@theme/IdealImage'; import Versions from '@site/src/components/Versions'; MotherDuck is a serverless cloud analytics service with a unique architecture that combines the power and scale of the cloud with the efficiency and convenience of DuckDB. MotherDuck's key components are: - The MotherDuck cloud service - MotherDuck's DuckDB SDK - Dual Execution - The MotherDuck web UI ![Architecture](./../img/md-diagram_v1.3.png) ### The MotherDuck cloud service The MotherDuck cloud service enables you to store structured data, query that data with SQL, and share it with others. A key MotherDuck product principle is ease of use. **Serverless execution model**—You don't need to configure or spin up instances, clusters, or warehouses. You simply write and submit SQL. MotherDuck takes care of the rest. Under the hood, MotherDuck runs DuckDB and speaks DuckDB's SQL dialect. **Managed storage**—you can load data into MotherDuck storage to be queried or shared. MotherDuck storage is durable, secure, and automatically optimized for best performance. MotherDuck storage is surfaced to you via the **catalog** and logical primitives database, schema, table, view, etc. In addition, MotherDuck can query data outside of MotherDuck storage—as data on Amazon S3, via https endpoints, on your laptop, and so on. **The service layer**—MotherDuck provides key capabilities like secure identity, authorization, administration, monitoring, and so on. Currently, billing is not enabled for MotherDuck, and the service is free to use. :::note Currently, MotherDuck runs on AWS `us-east-1` region. We are working on expanding to other regions and cloud providers. ::: ### MotherDuck's DuckDB SDK If you're using DuckDB in Python or CLI, you can connect to MotherDuck with a single line of code, `ATTACH 'md:';`. After you run this command, your DuckDB instance becomes supercharged by MotherDuck. MotherDuck's Dual Execution is enabled, and your DuckDB instance gets additional capabilities like sharing, secrets storage, better interoperability with S3, and cloud persistence. ### Dual Execution When connected together, DuckDB and MotherDuck form a different type of distributed system. The two nodes work in concert so you can query data wherever it lives, in the most efficient way possible. This query execution model, called **Dual Execution** (formerly known as Hybrid Execution), automatically routes the various stages of queries execution to the most opportune locations, including highly arbitrary scenarios: - If a SQL query queries data on your laptop, MotherDuck routes the query to your local DuckDB instance - If a SQL query queries data in MotherDuck or S3, MotherDuck routes that query to MotherDuck - If a SQL query executes a join between data on your laptop and data in MotherDuck, MotherDuck finds the best way to efficiently join the two ### The MotherDuck web UI You can use MotherDuck's web UI to analyze and share data and to perform administrative tasks. Currently MotherDuck's UI consists of a lightweight notebook, a SQL IDE, and a data catalog. Uniquely, MotherDuck caches query results in a highly interactive query results panel, enabling you to sort, filter, and even pivot data quickly. ## Summary of capabilities Currently with MotherDuck you can: - Use serverless DuckDB in the cloud to store data and execute DuckDB SQL - Load data into MotherDuck from your personal computer, https, or S3 - Join datasets on your computer with datasets in MotherDuck or in S3 - Copy DuckDB databases between local and MotherDuck locations - Materialize query results into local or MotherDuck locations, or S3 - Work with data in MotherDuck’s notebook UI, standard DuckDB CLI, or standard DuckDB Python package - Share databases with your teammates - Securely save S3 credentials in MotherDuck Additionally, MotherDuck supports connectivity to third party tools via: - JDBC - Go - sqlalchemy ## Considerations and limitations :::caution MotherDuck currently supports DuckDB and it is compatible with any client version through ::: MotherDuck does not yet support the full range of SQL of DuckDB. We are continuously working on improving coverage of DuckDB in MotherDuck. If you need specific features enabled, please let us know. Below is the list of DuckDB features that MotherDuck does not yet support: - Custom Python / Native user defined functions. - Server-side attach of postgres, sqlite, etc. - Custom or community extensions. - Appender API --- --- title: Concepts description: Concepts sidebar_class_name: architecture-icon --- This section contains a collection of high level views of concepts & features. import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 2 title: Building a Data Warehouse with MotherDuck sidebar_label: Data Warehousing description: Learn to use the MotherDuck as a cloud data warehouse --- ## The MotherDuck Ecosystem ![img_duck_stack](./../img/md-diagram.svg) Please do not hesitate to **[contact us](https://motherduck.com/customer-support/)** if you need help along your journey. ## MotherDuck Architectural Concepts :::note MotherDuck is a cloud-native data warehouse, built on top of DuckDB, a fast in-process analytical database. It inherits some features from DuckDB that present opportunities to think differently about data warehousing methods in order to achieve high levels of performance and simplify the experience. ::: - **Isolated Compute Tenancy**: Each user is allocated their own “duckling,” which is an isolated piece of compute that sits on top of the MotherDuck storage layer. MotherDuck is designed this way to lessen contention between users, which is a common challenge with other data warehouses. - **Aggressively Serverless**: Unlike conventional data warehouses, DuckDB automatically parallelizes the work that you send to it. The implication of this is that scheduling multiple queries at-a-time does not meaningfully increase throughput, as DuckDB has already parallelized the workload across all available resources. - **Database level security model**: It has a simplified access model - users either have access to an entire database, or nothing at all. As a result, users will interact with data frequently at the database level. This is unusual when compared to other databases, which often treat multiple database files as single concepts from an interactivity perspective. - **Database Sharing**: MotherDuck separates storage and compute, which means that one user cannot see another's writes into a database until that database is updated to that user. As such, it has its own concept called [“SHARES”](/key-tasks/sharing-data/sharing-overview/) within Organizations, which are zero-copy clones of the main database for read-only use, enabling high scalability of analytics workloads. - **Dual Execution**: Every MotherDuck client is also a DuckDB engine, so you can efficiently query local data and (JOIN, UNION) with data that's stored in your MotherDuck data warehouse. [The query planner automatically decides](/concepts/architecture-and-capabilities#dual-execution) the best place to execute each part of your query. ## Data Ingestion An easy way to get into MotherDuck is using [ecosystem partners](/integrations/#data-ingestion-tools) like [Fivetran](https://fivetran.com/docs/destinations/motherduck), [dlthub](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/duckdb) but you can also create custom data engineering pipelines. MotherDuck is very flexible with how to load your data: - **From data you have on your filesystem:** If you have CSVs, JSON files or DuckDB databases sitting around, It's easy to load it into your MotherDuck data warehouse. - **From a data lake on a cloud object store:** If you already have your data in a data lake, as parquet, delta, iceberg or other formats, DuckDB has abstractions for Secrets, Object Storage, and many file types. When combined, this means that many file types can be read into DuckDB from Object Storage with only SQL. Though not as performant, you can also query your infrequently-accessed data directly from your data lake with MotherDuck. - **Using Native APIs in many languages:** DuckDB supports numerous languages such as C++, Python, and Java, in addition to its own mostly Postgres-compatible SQL dialect. Using these languages, Data Engineers and Developers can easily integrate with MotherDuck without having to pick up yet-another-language. #### Best Practices for Programmatic Loading The fastest way to load data is to load single tables in large batches, saturating the network connection between MotherDuck and the source data. DuckDB is incredibly good at handling both files and some kinds of in-memory objects, like Arrow dataframes. As an aside, Parquet files compress at 5-10x compared to CSV, which means you can get 5-10x more throughput simply by using Parquet files. Similarly, open table formats like Delta & Iceberg share those performance gains. On the other hand, small writes on multiple tables will lead to suboptimal performance. While MotherDuck does indeed offer [ACID compliance](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid.html), it is not an OLTP system like postgres! Significantly better performance can be achieved by using queues to batch writes to tables. While some latency is introduced with this methodology, the improvement in throughput should far outweigh the cost of doing small writes. Therefore, it should be noted that streaming workloads are better suited to be handled with queues in front of MotherDuck. ## Transforming Data Once data is loaded into MotherDuck, it must be transformed into a model that matches the business purpose and needs. This can be done directly in MotherDuck using the powerful library of SQL functions offered by [DuckDB](https://duckdb.org/docs/sql/introduction.html). Many data engineers prefer to use data transformation tools like the open source [dbt Core](https://github.com/dbt-labs/dbt-core). More details specifically about using dbt with MotherDuck can be read in the [blog on this topic](https://motherduck.com/blog/duckdb-dbt-e2e-data-engineering-project-part-2/). For more in-depth reading, the free **[DuckDB in Action eBook](https://motherduck.com/duckdb-book-brief/)** explores these concepts with real-world examples. ## Sharing Data Once your data is loaded into MotherDuck and appropriately transformed for use by your analysts, you can make that data available using MotherDuck's [sharing capabilities](/key-tasks/sharing-data/sharing-overview/). This can allow every user in your organization to access the data warehouse in the MotherDuck UI, in their python code or with other tools. Admins don't need to worry that the queries run by users will impact their data pipelines as users have isolated compute. ## Serving Data Analytics Do you want to serve reports or dashboards for your users? MotherDuck provides tokens that can be used with [popular tools](./../../integrations/#business-intelligence-tools) like Tableau & Power BI to access your data warehouse to serve business intelligence to end users. #### Ducks all the Way Down: Building Data Apps MotherDuck is built on DuckDB because it is an extremely efficient SQL engine inside a ~20MB executable. This allows you to run the same DuckDB engine which powers your data warehouse inside your web browser, creating highly-interactive visualizations with near-zero latency. This enhances your experience when using the [Column Explorer](/getting-started/motherduck-quick-tour/#diving-into-your-data-with-column-explorer) in the MotherDuck UI. One thing that is unique to MotherDuck is its capabilities for serving data into the web layer via [WASM](/key-tasks/data-apps/wasm-client/). These capabilities enable novel analytical user actions, including very intensive queries that would be prohibitively expensive in other query engines. It also supports data mashup from various sources, so that data in the warehouse can easily be combined with other sources, like files in CSV, JSON, or Parquet. ## Orchestration In order to keep data up to date inside of MotherDuck, often an orchestrator like [Airflow](https://airflow.apache.org/) or [Dagster](https://dagster.io/) can be used. This runs jobs in specific orders to load & transform data, as well managing workflow and observability, which is necessary for handling more complex data engineering pipelines. If this is your first data warehouse, you might consider starting with something as simple as [GitHub actions](https://github.com/features/actions) or cron jobs to orchestrate your data pipelines. ## Need Help Along the Way? Please do not hesitate to **[contact us](https://motherduck.com/customer-support/)** if you need help along your journey. We are here to help you succeed with your data warehouse! --- --- sidebar_position: 3 title: pg_duckdb Extension --- [pg_duckdb](https://github.com/duckdb/pg_duckdb) is an Open-source Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. Main features include : - SELECT queries executed by the DuckDB engine can directly read Postgres tables - Read and Write support for object storage (AWS S3, Cloudflare R2, or Google GCS) - Read and Write support for data stored in MotherDuck For more information about functionality and installation, checkout the [repository's README](https://github.com/duckdb/pg_duckdb/blob/main/README.md). ## Connect with MotherDuck To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file: ```ini duckdb.motherduck_token = 'your_access_token' ``` NOTE: If you don't want to store the token in your `postgresql.conf`file can also store the token in the `motherduck_token` environment variable and then explicitly enable MotherDuck support in your `postgresql.conf` file: ```ini duckdb.motherduck_enabled = true ``` If you installed `pg_duckdb` in a different Postgres database than the default one named `postgres`, then you also need to add the following line to your `postgresql.conf` file: ```ini duckdb.motherduck_postgres_database = 'your_database_name' ``` After doing this (and possibly restarting Postgres). You can then you create tables in the MotherDuck database by using the `duckdb` [Table Access Method][tam] like this: ```sql CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb; CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users; ``` [tam]: https://www.postgresql.org/docs/current/tableam.html Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres. This is done in the following way: 1. Each schema in your default MotherDuck database are simply merged with the Postgres schemas with the same name. 2. Except for the `main` DuckDB schema in your default database, which is merged with the Postgres `public` schema. 3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the form `ddb$$` (including the literal `$` characters). 4. Except for the `main` schema in those other databases. That schema should be accessed using the shorter name `ddb$` instead. An example of each of these cases is shown below: ```sql INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1 SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news ``` [md]: https://motherduck.com/ [md-access-token]: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token --- --- sidebar_position: 3 title: Installing and Using the DuckDB CLI sidebar_label: Using the DuckDB CLI description: Learn to connect and query databases using MotherDuck from the DuckDB CLI --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import DownloadLink from '@site/src/components/DownloadLink'; import Versions from '@site/src/components/Versions'; ## Installation :::caution MotherDuck currently supports DuckDB and it is compatible with any client version through ::: Download and install the DuckDB binary, depending on your operating system. 1. Download the 64-bit Windows binary 2. Extract the Zip File. There are two recommended options for installing on MacOS. You can install with Homebrew or download the binary. ### Install with Homebrew To install DuckDB, you can use the following command in the Terminal: ```sh brew install duckdb ``` ### Alternative: download the binary 1. Download the binary 2. Extract the zip file. 1. Download the Linux binary: - For 64-bit, download the binary - For arm64/aarch64, download the binary 2. Extract the Zip File. For more information, see the [DuckDB installation documentation](https://duckdb.org/docs/installation/). ## Run the DuckDB CLI Run DuckDB using the command: ```sh ./duckdb ``` By default, DuckDB will start with an in-memory database and any changes will not be persisted. To create a persistent database in the DuckDB CLI, you can specify a new filename as the first argument to the `duckdb` command. Example: ```sh ./duckdb mydatabase.ddb ``` ## Connect to MotherDuck You can connect to MotherDuck by executing the following in DuckDB CLI. DuckDB will automatically download and load the signed MotherDuck extension. ```bash ATTACH 'md:'; ``` DuckDB will prompt you to authenticate with MotherDuck using your default web browser. Follow the instructions displayed in the terminal. Test your MotherDuck connection using the following command. It will run in the cloud to display a list of your MotherDuck databases. ```sql show databases; ``` Congrats 🎉 You are connected! Now you can create databases and switch between them. You can also connect to your local DuckDB databases alongside databases hosted in MotherDuck, and interact with both! To know more about how to persist your authentication credentials, read [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md) :::info Note you can also connect to MotherDuck directly when starting DuckDB CLI by running the following command: ```bash duckdb "md:" ``` ::: ## Accessing the MotherDuck UI from the CLI: You can access the MotherDuck UI from the CLI by executing the following command in the terminal: ```bash duckdb -ui ``` If you are already in a DuckDB session, you can instead use `CALL start_ui();` ## Upgrading MotherDuck via the DuckDB CLI: If you have previously installed the extension, but we have upgraded the service, you may need to run the `FORCE INSTALL` command as shown in the following example. ```sh FORCE INSTALL motherduck ``` --- --- sidebar_position: 2 title: Specify MotherDuck database description: Specify MotherDuck database --- When you connect to MotherDuck you can specify a database name or omit the database name and connect to the default database. - If you use `md:` without a database name, you connect to a default MotherDuck database called `my_db`. - If you use `md:`, you connect to the `` database. After you establish the connection, either the default database or the one you specify becomes the current database. You can run the `USE` command to switch the current database, as shown in the following example. ```python #list the current database con.sql("SELECT current_database()").show() # ('database1') #switch the current database to database2 con.sql("USE database2") ``` To query a table in the current database, you can specify just the table name. To query a table in a different database, you can include the database name when you specify the table. You don't need to switch the current database. The following examples demonstrate each method. ```sql #querying a table in the current database con.sql("SELECT count(*) FROM mytable").show() #querying a table in another database con.sql("SELECT count(*) FROM another_db.another_table").show() ``` --- --- sidebar_position: 1 title: Installation & authentication description: How to install DuckDB and connect to MotherDuck --- import Versions from '@site/src/components/Versions'; import { duckdb } from '@site/src/components/Versions'; ## Prerequisites MotherDuck Python supports the following operating systems: - Linux (x64, glibc v2.31+, equivalent to ubuntu v20.04+) - Mac OSX 11+ (M1/ARM or x64) - Python 3.4 or later Please let us know if your configuration is unsupported. ## Installing DuckDB :::caution MotherDuck currently supports DuckDB and it is compatible with any client version through ::: Use the following `pip` command to install the supported version of DuckDB:

{`pip install duckdb==${ duckdb }`}
## Connect to MotherDuck You can connect to and work with multiple local and MotherDuck-hosted DuckDB databases at the same time. Currently, the connection syntax varies depending on how you’re opening local DuckDB and MotherDuck. ### Authenticating to MotherDuck You can authenticate to MotherDuck using either browser-based authentication or an access token. Here are examples of both methods: #### Using browser-based authentication ```python import duckdb # connect to MotherDuck using 'md:' or 'motherduck:' con = duckdb.connect('md:') ``` When you run this code: 1. A URL and a code will be displayed in your terminal. 2. Your default web browser will automatically open to the URL. 3. You'll see a confirmation request to approve the connection. 4. Once, approved, if you're not already logged in to MotherDuck, you'll be prompted to do so. 5. Finally, you can close the browser tab and return to your Python environment. This method is convenient for interactive sessions and doesn't require managing access tokens. #### Using an access token For automated scripts or environments where browser-based auth isn't suitable, you can use an access token: ```python import duckdb # Initiate a MotherDuck connection using an access token con = duckdb.connect('md:?motherduck_token=') ``` Replace `` with an actual token generated from the MotherDuck UI. To learn more about creating and managing access tokens, as well as other authentication options, see our guide on [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md). ### Connecting to MotherDuck Once you've authenticated, you can connect to MotherDuck and start working with your data. Let's look at a few common scenarios. #### Connecting directly to MotherDuck Here's how to connect to MotherDuck and run a simple query: ```python import duckdb # Connect to MotherDuck via browser-based authentication con = duckdb.connect('md:my_db') # Run a query to verify the connection con.sql("SHOW DATABASES").show() ``` :::tip When connecting to MotherDuck, you need to specify a database name (like `my_db` in the example). If you're a new user, a default database called `my_db` is automatically created when your account is first set up. You can query any table in your connected database by just using its name. To switch databases, use the `USE` command. ::: #### Working with both MotherDuck and local databases MotherDuck allows you to work with both cloud and local databases simultaneously. Here's how: ````python import duckdb # Connect to MotherDuck first, specifying a database con = duckdb.connect('md:my_db') # Then attach local DuckDB databases con.sql("ATTACH 'local_database1.duckdb'") con.sql("ATTACH 'local_database2.duckdb'") # List all connected databases con.sql("SHOW DATABASES").show() ```` #### Adding MotherDuck to an existing local connection If you're already working with a local DuckDB database, you can easily add a MotherDuck connection: ````python import duckdb # Start with a local DuckDB database local_con = duckdb.connect('local_database.duckdb') # Add a MotherDuck connection, specifying a database local_con.sql("ATTACH 'md:my_db'") ```` This is another approach to give you the flexibility to work with both local and cloud data in the same session. --- --- sidebar_position: 3 title: Loading data into MotherDuck with Python sidebar_label: Loading data into MotherDuck --- ## Copying a table from a local DuckDB database into MotherDuck You can currently use `CREATE TABLE AS SELECT` to load CSV, Parquet, and JSON files into MotherDuck from either local, Amazon S3, or https sources as shown in the following examples. ```python # load from local machine into table mytable of the current/active used database con.sql("CREATE TABLE mytable AS SELECT * FROM '~/filepath.csv'"); # load from an S3 bucket into table mytable of the current/active database con.sql("CREATE TABLE mytable AS SELECT * FROM 's3://bucket/path/*.parquet'") ``` If the source data matches the table’s schema exactly you can also use `INSERT INTO`, as shown in the following example. ```python # append to table mytable in the currently selected database from S3 con.sql("INSERT INTO mytable SELECT * FROM 's3://bucket/path/*.parquet'") ``` ## Copying an entire local DuckDB database To MotherDuck MotherDuck supports copying your currently opened DuckDB database into a MotherDuck database. The following example copies a local DuckDB database named `localdb` into a MotherDuck-hosted database named `clouddb`. ```python # open the local db local_con = duckdb.connect("localdb.ddb") # connect to MotherDuck local_con.sql("ATTACH 'md:'") # The from indicates the file to upload. An empty path indicates the current database local_con.sql("CREATE DATABASE clouddb FROM CURRENT_DATABASE()") ``` A local DuckDB database can also be copied by its file path: ```sql local_con = duckdb.connect("md:") local_con.sql("CREATE DATABASE clouddb FROM 'localdb.ddb'") ``` See [Loading Data into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) for more detail. --- --- sidebar_position: 4 title: Query data --- For more information about database manipulation, see [MotherDuck SQL reference](/category/motherduck-sql). MotherDuck uses DuckDB under the hood, so nearly all [DuckDB SQL](https://duckdb.org/docs/) works in MotherDuck without differences. MotherDuck leverages “hybrid execution” to decide the best location to execute queries, including across multiple locations. For example, if your data lives on your laptop, MotherDuck executes queries against that data on your laptop. Similarly, if you are joining data on your laptop to data on Amazon S3, MotherDuck executes each part of the query where data lives before bringing it together to be joined locally. ## Querying data In MotherDuck You can query data loaded into MotherDuck the same way you query data in your DuckDB databases. MotherDuck executes these queries using resources in the cloud. ```sql # table table_name is in MotherDuck storage con.sql("SELECT * FROM table_name").show(); ``` ## Querying data on your machine You can use MotherDuck to query files on your local machine. These queries execute using your machine’s resources. ```sql # query a Parquet file on your local machine con.sql("SELECT * FROM '~/file.parquet'").show(); # query a table in a local DuckDB database con.sql("SELECT * FROM local_table").show(); ``` ## Joining data across multiple locations You can use MotherDuck to join data: - In MotherDuck - On S3 or other cloud object stores (Azure, GCS, R2, etc) - On your local machine ## What's next ? Ready to share your DuckDB data with your colleagues? Read up on [Sharing In MotherDuck](/key-tasks/sharing-data/sharing-data.mdx). --- --- sidebar_position: 1 title: MotherDuck and DuckDB Tutorial sidebar_label: Tutorial description: Learn MotherDuck and DuckDB through a small end-to-end guide --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Versions from '@site/src/components/Versions'; In this tutorial, you will go through a full end-to-end example on how to use MotherDuck and DuckDB, **push** and **share** data, take advantage of **hybrid query** execution and query data using SQL through the **MotherDuck UI** or **DuckDB CLI**. :::note MotherDuck currently supports DuckDB and it is compatible with any client version through ::: ## Prerequisites For this tutorial, you will need: * [A MotherDuck account](https://app.motherduck.com/). Examples will be covering using the MotherDuck UI and the DuckDB CLI * A csv dataset. You can download our example csv dataset hosted on our public AWS S3 bucket [here](https://us-prd-motherduck-open-datasets.s3.amazonaws.com/misc/csv/popular_currency_rate_dollar_20230620.csv). Feel free to use your own csv/parquet data! * DuckDB CLI for [running a hybrid query](#running-a-hybrid-query), check how to install this one [here](./connect-query-from-duckdb-cli.mdx). Understanding SQL is valuable, but no other programming experience is needed. ## Running your first query ### Query from a shared database Head over to the [MotherDuck UI](https://app.motherduck.com/) and after logging in, go through the notebook and add a new cell. Before playing with the dataset we just downloaded, let's run a couple simple queries on the shared sample database. This database contains a series of MotherDuck's public datasets and it's *auto-attached* for each user, meaning it's accessible direclty within your MotherDuck session without any additional setup. We will query the NYC 311 dataset first. This dataset contains over thirty million complaints citizens have filed with the New York City government. We'll select several columns and look at the compliants filed over a few days to demonstrate the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) feature of the MotherDuck UI. ```sql SELECT created_date, agency_name, complaint_type, descriptor, incident_address, resolution_description FROM sample_data.nyc.service_requests WHERE created_date >= '2022-03-27' AND created_date <= '2022-03-31'; ``` ![UI Capability](./img/screenshot_tutorial_ui_capability_v2.png) Take a moment to explore the shape of this data with the Column Explorer. For the remainder of this tutorial, we'll focus on the NYC taxi data and perform aggregation queries representative of the types of queries often performed in analytics databases. We will first get the average fare based on the number of passengers. The source dataset covers data for the whole month of November 2022. ```sql SELECT passenger_count, avg(total_amount) -- reading from shared sample database FROM sample_data.nyc.taxi GROUP BY passenger_count ORDER by passenger_count; ``` ![Query Result](./img/screenshot_tutorial_result_v2.png) :::info The `sample_data` database is auto-attached but for any other shared database you would like to read, you would need to use the `ATTACH` statement. Read more about querying a shared MotherDuck database **[here](/key-tasks/sharing-data/sharing-data.mdx).** ::: You can also run the same queries using the DuckDB CLI. You just need to connect to MotherDuck first using the `ATTACH 'md:';` command. You will be prompted to authenticate if there is no `motherduck_token` found in your environment. ```bash ATTACH 'md:'; ``` ```sql SELECT created_date, agency_name, complaint_type, descriptor, incident_address, resolution_description FROM sample_data.nyc.service_requests WHERE created_date >= '2022-03-27' AND created_date <= '2022-03-31'; ``` ```sql SELECT passenger_count, avg(total_amount) FROM sample_data.nyc.taxi GROUP BY passenger_count ORDER by passenger_count; ``` ### Query from S3 Our shared sample database is great to play with but you probably want to use your own data on AWS S3. Let's see how to do that. The sample database source data is actually available on our public AWS S3 bucket. Let's run the exact same query but instead of pointing to a MotherDuck table, we will point to a parquet file on S3. For a secured bucket, we need to pass the AWS credentials - check [authenticating to S3](../integrations/cloud-storage/amazon-s3.mdx) for more information. Here's the updated query while reading from S3: ```sql SELECT passenger_count, avg(total_amount) -- reading from AWS S3 parquet files FROM read_parquet('s3://us-prd-motherduck-open-datasets/nyc_taxi/parquet/yellow_cab_nyc_2022_11.parquet') GROUP BY passenger_count ORDER by passenger_count; ``` :::info If your data is a csv, you can use the `read_csv_auto()` method instead of the `read_parquet()`. Similarly, for json it's `read_json_auto()`. ::: ## Loading your dataset Head over to the button "ADD FILES" and select the [dataset](https://us-prd-motherduck-open-datasets.s3.amazonaws.com/misc/csv/popular_currency_rate_dollar.csv) you just downloaded. ![Add file](./img/screenshot_tutorial_add_file.png) A cell will be automatically created with the following code: ![create statement](./img/screenshot_tutorial_create_statement.png) ```sql CREATE TABLE my_db.popular_currency_rate_dollar AS SELECT * FROM read_csv_auto(['popular_currency_rate_dollar.csv']); ``` Run this cell in order to create a table that would contain the csv's data. You can also rename the table by changing the name after the `CREATE TABLE` statement. You can now run queries on this MotherDuck table. For example, let's see the top 10 rows of the table: ```sql FROM my_db.popular_currency_rate_dollar limit 10; ``` ![create statement](./img/screenshot_tutorial_result_currency.png) ``` bash ATTACH 'md:'; USE my_db; CREATE TABLE my_db.popular_currency_rate_dollar AS SELECT * FROM read_csv_auto(['./popular_currency_rate_dollar.csv']); ``` :::note Don't forget to adapt the file path to the actual local path where you downloaded the `csv` file. ::: ## Running a hybrid query To experience [hybrid query execution](../key-tasks/running-hybrid-queries.md), we'll need to use the DuckDB CLI. With the local csv `popular_currency_rate_dollar.csv` which you should have downloaded in the steps provided above, we will run a query that combines local data with data from the sample_data cloud database. Our file, `popular_currency_rate_dollar.csv`, contains currency rates against the U.S. dollar over a few days. Let's utilize the same query we used above to determine the average fare. However, instead of presenting the results in dollars, we're interested in seeing them in British Pounds (GBP). ```sql SELECT cr.currency_code, t.passenger_count, AVG(t.total_amount * cr.exchange_rate) as average_converted_amount FROM sample_data.nyc.taxi t CROSS JOIN -- reading from local csv, adapt the path where you downloaded the file (SELECT * FROM read_csv_auto('./popular_currency_rate_dollar.csv')) cr WHERE cr.currency_code = 'GBP' GROUP BY cr.currency_code, t.passenger_count ORDER by t.passenger_count ASC; ``` You can see the execution through the EXPLAIN method : ```sql EXPLAIN SELECT cr.currency_code, t.passenger_count, AVG(t.total_amount * cr.exchange_rate) as average_converted_amount FROM sample_data.nyc.taxi t CROSS JOIN -- reading from local csv, adapt the path where you downloaded the file (SELECT * FROM read_csv_auto('./popular_currency_rate_dollar.csv')) cr WHERE cr.currency_code = 'GBP' GROUP BY cr.currency_code, t.passenger_count ORDER by t.passenger_count ASC; ``` ```bash ┌───────────────┬─────────────────┬──────────────────────────┐ │ currency_code │ passenger_count │ average_converted_amount │ │ varchar │ double │ double │ ├───────────────┼─────────────────┼──────────────────────────┤ │ GBP │ 0.0 │ 15.932329767528195 │ │ GBP │ 1.0 │ 16.68244975354177 │ │ GBP │ 2.0 │ 18.939035313855573 │ │ GBP │ 3.0 │ 18.24300645274264 │ │ GBP │ 4.0 │ 19.073578370153896 │ │ GBP │ 5.0 │ 16.526609827337477 │ │ GBP │ 6.0 │ 16.91326606429221 │ │ GBP │ 7.0 │ 59.94501665999999 │ │ GBP │ 8.0 │ 48.38727310588234 │ │ GBP │ 9.0 │ 59.48654116 │ │ GBP │ │ 23.031504804070522 │ ├───────────────┴─────────────────┴──────────────────────────┤ │ 11 rows 3 columns │ └────────────────────────────────────────────────────────────┘ ``` Each operation is followed by either `(L)`= Local or `(R)`= Remote. ``` ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ DOWNLOAD_SOURCE (L) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ bridge_id: 1 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ BATCH_DOWNLOAD_SINK (R) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ bridge_id: 1 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ ORDER_BY (R) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ ORDERS: │ │ t.passenger_count ASC │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ HASH_GROUP_BY (R) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ #0 │ │ #1 │ │ avg(#2) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ currency_code │ │ passenger_count │ │ (total_amount * │ │ exchange_rate) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ CROSS_PRODUCT (R) ├──────────────┐ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN (R) ││ UPLOAD_SOURCE (R) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ yellow_cab_nyc_2022_11 ││ bridge_id: 2 │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ │ │ passenger_count ││ │ │ total_amount ││ │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ │ │ EC: 3252717 ││ │ └───────────────────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ UPLOAD_SINK (L) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ bridge_id: 2 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ FILTER (L) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ (currency_code = 'GBP') │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ EC: 132 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ READ_CSV_AUTO (L) │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ LOCAL │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ currency_code │ │ exchange_rate │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ EC: 0 │ └───────────────────────────┘ ``` Finally, let's create a database with a table based on this query and share the result. ```sql CREATE DATABASE holiday_budget; ``` ``` CREATE TABLE holiday_budget.taxi_nyc_fare AS SELECT cr.currency_code, t.passenger_count, AVG(t.total_amount * cr.exchange_rate) as average_converted_amount FROM sample_data.nyc.taxi t CROSS JOIN -- reading from local csv, adapt the path where you downloaded the file (SELECT * FROM read_csv_auto('./popular_currency_rate_dollar.csv')) cr WHERE cr.currency_code = 'GBP' GROUP BY cr.currency_code, t.passenger_count ORDER by t.passenger_count ASC; ``` ## Sharing your database Now that you have a new dataset and a database, you can share it with your colleagues. To do so, we'll create a share which creates a point-in-time snapshot of the database. Click on the drop down menu next to the database you want to share: ![share 1](./img/screenshot_tutorial_share_1.png) You will be prompted with a window to create a share. ![share 2](./img/screenshot_tutorial_share_2.png) The syntax to create a share visible to everyone in your Organization is `CREATE SHARE from `. ```sql CREATE SHARE duck_holiday_budget FROM holiday_budget (ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE); ``` ```bash ┌───────────────────────────────────────────────────────────────┐ │ share_url │ │ varchar │ ├───────────────────────────────────────────────────────────────┤ │ md:_share/holiday_budget/b556630d-74f1-435c-9459-cfb87d349cb3 │ └───────────────────────────────────────────────────────────────┘ ``` Now everyone in your Organization will see this share in the UI under "Shared with me". They simply need to press "Attach" to start querying! Learn more about sharing in MotherDuck [here](../key-tasks/sharing-data/sharing-within-org.md). ## Going further Try it with your own data! Look at our [supported integrations](/integrations) and keep coding, keep quacking. --- --- title: Welcome to MotherDuck! sidebar_class_name: getting-started-icon description: Getting started with MotherDuck serverless cloud analytics service. --- Getting Started | MotherDuck Docs import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; ## Quick start * 📝 [Sign up](https://app.motherduck.com/?auth_flow=signup) for a free account * 👇 Read this [MotherDuck overview](#motherduck-overview) (you're here!) * 🧭 Take a [tour of the MotherDuck UI](./motherduck-quick-tour.md) * 🚀 Complete the [MotherDuck tutorial](./e2e-tutorial.md) * 📈 Learn how to connect and query from [BI and Visualization Tools](/integrations/) * 🖥️ Learn how to connect and query from the [DuckDB CLI](./connect-query-from-duckdb-cli.mdx) * 📦 Learn how to connect and query from the [Python DuckDB package](./connect-query-from-python/installation-authentication.md) Questions? Join the pond on [our slack community!](https://slack.motherduck.com/) ## MotherDuck overview [MotherDuck](https://motherduck.com) is a [cloud data warehouse](https://motherduck.com/product/data-teams) and SQL analytics backend for [building data apps](https://motherduck.com/product/app-developers/). MotherDuck is powered by DuckDB, a free and open source in-process OLAP database developed and maintained by the [DuckDB community](https://duckdb.org/) and [DuckDB Labs](https://duckdblabs.com/). MotherDuck is a separate organization closely partnering with DuckDB Labs to build a DuckDB-based cloud analytics service. As a DuckDB user, you can connect to MotherDuck to supercharge your local DuckDB experience with cloud-based manageability, persistence, scale, sharing, and productivity tools. MotherDuck runs on vanilla DuckDB, so you can use your existing DuckDB skills and tools to connect to MotherDuck.

- Use serverless DuckDB in the cloud to store data and execute DuckDB SQL - Load data into MotherDuck [from your personal computer, https, or cloud storage](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) (e.g. Amazon S3 or Azure Cloud Storage) - Securely save cloud storage credentials in MotherDuck - [Sharing databases](/key-tasks/sharing-data/sharing-data.mdx) with your teammates - Check out [our other integrations](/integrations) (Golang, R, etc.), or use any and connect through `"md:"`. :::note MotherDuck currently supports DuckDB and it is compatible with any client version through ::: --- --- sidebar_position: 3 title: Using MotherDuck UI description: Learn to use the MotherDuck Web UI to configure and query databases --- ## Login To log in to MotherDuck UI, please go to [app.motherduck.com](https://app.motherduck.com). You will be redirected to our web UI. :::info Note you can also connect to the MotherDuck UI directly when starting DuckDB CLI by running the following command: ```bash duckdb "md:" -ui ``` ::: ## Main Window ![UI](./img/screenshot_ui.png) ## Executing a sample query After you log in, run the following SQL query: ```sql SELECT country_name, city, pm25_concentration AS pm25_pollution FROM sample_data.who.ambient_air_quality WHERE year=2019 AND pm25_concentration IS NOT NULL ORDER BY pm25_pollution ASC ``` This query accesses the [Sample Data Database](/getting-started/sample-data-queries/datasets) which is [attached](/key-tasks/sharing-data/sharing-data.mdx) by default. MotherDuck executes this query in the cloud. Query results are saved into your browser into an interactive panel for fast data exploration with data sorting, filtering, and pivoting. ![Query Result](./img/ui_query_results.gif) You can also click the Expand button on the top right of each cell to expand the editor and results area. ![Expand cells](./img/screenshot_expand_cells.png) ## Diving into your data with Column Explorer
### Exploring tables or resultsets The Column Explorer allows you to see stats on either a selected table or the resultset from the selected notebook cell. ### Seeing value frequencies For each column, you'll see the column type, the most commonly-occurring values and the percentage of values that are NULL. In the case the values are numerical, you'll see a histogram visualization. ### Charting data over time If you have timestamp data, you'll also see a chart in the Column Explorer with automatic binning over time.

The Column Explorer is collapsible by clicking the toggle on the top right. ![Collapse column explorer](./img/screenshot_collapse_column_explorer.png) ### Dig into your results in the Cell Content Pane Click on a cell in your results to see it's full contents. ![UI](./img/cell_content_long_text.png) #### Interact with JSON values Expand, collapse, and copy content from JSON type columns. You can also copy the keypath to a specific value, or the value itself! ![UI](./img/cell_content_json.png) ## Writing queries with Autocomplete MotherDuck Web UI supports autocomplete. As you write SQL in the UI, on every keystroke autocomplete brings up query syntax suggestions. You can turn off autocomplete in Web UI settings, found by clicking your profile in the top-left and choosing "Settings" followed by "Preferences." ## Writing SQL with confidence using FixIt [FixIt](/key-tasks/writing-sql-with-ai#automatically-fix-sql-errors-in-the-webui) helps you resolve common SQL errors by offering fixes in-line. FixIt uses a large language model (LLM) to generate suggestions; it feeds the error, the query, and additional context into an LLM to generate a new line that fixes the query. ## Settings MotherDuck settings are found by clicking your profile at the top-left. These settings are specific to each MotherDuck user and organization. ### General: Access Tokens This section allows you to create access tokens, which can be use for programmatically [authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/). Tokens can have expiry dates. ### Organization This section allows you to change the display name of the organization. You can also enable all users in your email domain to join you in your MotherDuck organization. See ["Managing Organizations"](../../key-tasks/managing-organizations/#enabling-all-users-in-a-domain-to-join-your-organization) for more information. ### Members Shows you the members with access to this organization and allows you to invite new members to join you in MotherDuck. ### Preferences: UI settings * Enable [autocomplete when typing](#writing-queries-with-autocomplete) * Enable inline [SQL error fix suggestions](/key-tasks/writing-sql-with-ai#automatically-fix-sql-errors-in-the-webui) ### Secrets MotherDuck enables you to query cloud blob stores without supplying credentials each time. Currently, credentials are supported for [AWS S3](/integrations/cloud-storage/amazon-s3) and [Azure Blob Storage](/integrations/cloud-storage/azure-blob-storage), [Google Cloud Storage (GCS)](/integrations/cloud-storage/google-cloud-storage), CloudFlare R2 and Hugging Face. ### Plans Shows your current plan (Free, Standard) and allows you to switch plans. ### Billing Displays your current plan, primary billing email address and estimated invoices and usage during free trial. After the free trial, you can see actual usage and access your invoices. ## Keyboard shortcuts MotherDuck supports the following keyboard shortcuts. Use `Ctrl` for Windows/Linux and `⌘` (Command) for Mac. Use `Alt` for Windows/Linux and `⌥` (Option) for Mac. | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `Enter` | Run the current cell. | | `Ctrl`/`⌘` + `Shift` + `Enter` | Run selected text in the current cell. If no text is selected, run the whole cell. | | `Shift` + `Enter` or `Alt`/`⌥` + `Enter` | Run the current cell, then advance to the next cell, creating a new one if necessary. | | `Tab` | When editing a query, indent current line. When navigating the notebook, advance to next UI element/button.

Hit `Esc` to allow `Tab` to no longer indent current line and advance to next UI element instead. | | `Shift` + `Tab` | When editing a query, de-indent current line. When navigating the notebook, move to previous UI element/button.

Hit `Esc` to allow `Shift` + `Tab` to no longer de-indent current line and move to previous UI element instead. | | `Esc` | Change `Tab` key behavior to navigate the UI instead of indent/de-indent editor text. Once another cell is selected, `Tab` behavior reverts to indent/de-indent. | | `Ctrl`/`⌘` + `/` | Toggle comment on all highlighted lines in query editor by prepending or removing `--` at the start of each line.

Will comment out the highlighted lines unless all lines are already commented out, in which case it will uncomment. | | `Ctrl`/`⌘` + `z` | Undo query edits within currently selected cell. | | `Ctrl`/`⌘` + `Shift` + `z` | Redo query edits within currently selected cell. | | `Ctrl`/`⌘` + `↑` | Move currently selected cell up. | | `Ctrl`/`⌘` + `↓` | Move currently selected cell down. | --- --- sidebar_position: 3 title: Air Quality description: Sample data from the WHO Ambient Air Quality Database to use with DuckDB and MotherDuck --- ## About the dataset The [WHO Ambient Air Quality Database](https://www.who.int/publications/m/item/who-ambient-air-quality-database-(update-2023)) (6th edition, released in **May 2023**) compiles annual mean concentrations of nitrogen dioxide (NO2) and particulate matter (PM10, PM2.5) from ground measurements across over 8600 human settlements in more than 120 countries. This data, updated every 2-3 years since **2011**, primarily represents city or town averages and is used to monitor the Sustainable Development Goal Indicator 11.6.2, Air quality in cities. Here's the schema : | column_name | column_type | null | key | default | extra | |--------------------|-------------|------|-----|---------|-------| | who_region | VARCHAR | YES | | | | | iso3 | VARCHAR | YES | | | | | country_name | VARCHAR | YES | | | | | city | VARCHAR | YES | | | | | year | BIGINT | YES | | | | | version | VARCHAR | YES | | | | | pm10_concentration | BIGINT | YES | | | | | pm25_concentration | BIGINT | YES | | | | | no2_concentration | BIGINT | YES | | | | | pm10_tempcov | BIGINT | YES | | | | | pm25_tempcov | BIGINT | YES | | | | | no2_tempcov | BIGINT | YES | | | | | type_of_stations | VARCHAR | YES | | | | | reference | VARCHAR | YES | | | | | web_link | VARCHAR | YES | | | | | population | VARCHAR | YES | | | | | population_source | VARCHAR | YES | | | | | latitude | FLOAT | YES | | | | | longitude | FLOAT | YES | | | | | who_ms | BIGINT | YES | | | | To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### Annual city air quality rating This query assesses the average annual air quality in different cities per year based on WHO guidelines. It calculates the average concentrations of PM2.5, PM10, and NO2, then assigns an air quality rating of 'Good', 'Moderate', or 'Poor'. 'Good' indicates all pollutants are within WHO recommended levels, 'Poor' indicates all pollutants exceed WHO recommended levels, and 'Moderate' refers to any other scenario. The results are grouped and ordered by city and year. ```sql SELECT city, year, CASE WHEN AVG(pm25_concentration) <= 10 AND AVG(pm10_concentration) <= 20 AND AVG(no2_concentration) <= 40 THEN 'Good' WHEN AVG(pm25_concentration) > 10 AND AVG(pm10_concentration) > 20 AND AVG(no2_concentration) > 40 THEN 'Poor' ELSE 'Moderate' END AS airqualityrating FROM sample_data.who.ambient_air_quality GROUP BY city, year ORDER BY city, year; ``` ### Yearly average pollutant concentrations of a city This query calculates the yearly average concentrations of PM2.5, PM10, and NO2 in a given city, here `Berlin`. ```sql SELECT year, AVG(pm25_concentration) AS avg_pm25, AVG(pm10_concentration) AS avg_pm10, AVG(no2_concentration) AS avg_no2 FROM sample_data.who.ambient_air_quality WHERE city = 'Berlin' GROUP BY year ORDER BY year DESC; ``` --- --- title: Example Datasets description: A collections of open datasets and queries to get you started with DuckDB and MotherDuck --- We have prepared a series of datasets for you to play to dive in MotherDuck! The database `sample_data` is readily available for all new users, as it's automatically attached to your account. Other databases are available for you to attach, and you can do so by running the following command: ```sql ATTACH AS ``` | Database | `schema.table` | Description | Share URL | Attached by default | |-----------------|---------------------------------------------|-----------------------------------------------------------------|------------------------------------------------------------------|---------------------| | sample_database | [`who.ambient_air_quality`](air-quality.md) | Historical air quality data from the World Health Organization. | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | [`nyc.taxi`](nyc-311-data.md) | Taxi Ride data from November 2020 | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | [`nyc.rideshare`](nyc-311-data.md) | Ride share trips (Lyft, Uber etc) in NYC | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | [`nyc.service_requests`](nyc-311-data.md) | Requests to NYC's 311 complaint hotline via phone and web | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | [`hn.hacker_news`](hacker-news.md) | Sample of comments from [Hacker News](https://news.ycombinator.com/) | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | `kaggle.movies` | Sample of the movies dataset from [Kaggle](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset) | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | `stackoverflow_survey.survey_results` | Survey results from 2017 to 2024 | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | sample_database | `stackoverflow_survey.survey_schemas` | Survey schemas (questions from the survey) from 2017 to 2024 | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes | | stackoverflow | [`main.badges`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | stackoverflow | `main.comments`[](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | stackoverflow | [`main.post_links`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | stackoverflow | [`main.posts`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | stackoverflow | [`main.tags`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | stackoverflow | [`main.votes`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | stackoverflow | [`main.users`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No | | duckdb_stats | [`PyPi data on DuckDB project`](pypi.md) | Pythons data download from PyPi on the `duckdb` Python package, refreshed daily | `'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410'` | No | | hacker_news | [`hacker_news.hacker_news`](hacker-news.md) | Full [Hacker News](https://news.ycombinator.com/) datasets from 2016 to 2025 | `'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b'` | No | | foursquare | [`foursquare.fsq_os_places`](foursquare.md) | A global dataset of over 100 million points of interest (POIs) with detailed location, business, and contact information. | `'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b'` | No | | foursquare | [`foursquare.fsq_os_categories`](foursquare.md) | A hierarchical classification of POIs with up to six levels, detailing category names and IDs.| `'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b'` | No | --- --- sidebar_position: 4 title: Foursquare description: Foursquare Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI) --- ## About the dataset [Foursquare](https://docs.foursquare.com/data-products/docs/fsq-places-open-source) Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI), featuring 22 core attributes, updated monthly, and designed to support geospatial applications with a collaborative, AI- and human-powered data curation system. This database is updated monthly, we host however a snapshot of 2025-01-10. You have two tables : - `fsq_os_places` (Places) : a global dataset of over 100 million points of interest (POIs) with detailed location, business, and contact information. - `fsq_os_categories` (Categories) : a hierarchical classification of POIs with up to six levels, detailing category names and IDs. You can attach the `foursquare` database to your account by running the following command: ```sql ATTACH 'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b' AS foursquare; ``` ## Schema ### fsq_os_places - Places Dataset | Column Name | Type | Description | |--------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | fsq_place_id | String | The unique identifier of a Foursquare POI. Use this ID to view a venue at: `foursquare.com/v/{fsq_place_id}ud` | | name | String | Business name of a POI | | latitude/longitude | Decimal | Decimal coordinates (WGS84 datum) up to 6 decimal places. Derived from third-party sources, user input, and corrections. Default geocode type: front door or rooftop. | | address | String | User-entered street address of the venue | | locality | String | City, town, or equivalent where the POI is located | | region | String | State, province, or territory. Abbreviations used in US, CA, AU, BR; full names elsewhere | | postcode | String | Postal code or equivalent, formatted based on country (e.g., 5-digit US ZIP code) | | admin_region | String | Additional sub-division (e.g., Scotland) | | post_town | String | Town/place used in postal addressing (may differ from geographic location) | | po_box | String | Post Office Box | | country | String | 2-letter ISO Country Code | | date_created | Date | Date the POI entered the database (not necessarily the opening date) | | date_refreshed | Date | Last date any reference was refreshed via crawl, users, or validation | | date_closed | Date | Date the POI was marked closed in the database (not necessarily actual closure date) | | tel | String | Telephone number with local formatting | | website | String | URL to the POI’s (or chain’s) website | | email | String | Primary contact email address, if available | | facebook_id | String | POI's Facebook ID, if available | | instagram | String | POI's Instagram handle, if available | | twitter | String | POI's Twitter handle, if available | | fsq_category_ids | Array (String) | ID(s) of the most granular category(ies). See the Categories page for details | | fsq_category_labels| Array (String) | Label(s) of the most granular category(ies). See the Categories page for details | | placemaker_url | String | Link to the POI’s review page in PlaceMaker Tools for suggesting edits or reviewing pending changes | | geom | wkb | Geometry of the POI in WKB format for visualization through the vector tiling service | | bbox | struct | An area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0. `bbox:struct xmin:double ymin:double xmax:double ymax:double` | --- ### fsq_os_categories - Category Dataset | Column Name | Type | Description | |----------------------|---------|-----------------------------------------------------------------------------------------------------| | category_id | String | Unique identifier of the Foursquare category (BSON format) | | category_level | Integer | Hierarchy depth of the category (1-6) | | category_name | String | Name of the most granular category | | category_label | String | Full category hierarchy separated by `>` | | level1_category_id | String | Unique ID of the first-level category | | level1_category_name | String | Name of the first-level category | | level2_category_id | String | Unique ID of the second-level category | | level2_category_name | String | Name of the second-level category | | level3_category_id | String | Unique ID of the third-level category | | level3_category_name | String | Name of the third-level category | | level4_category_id | String | Unique ID of the fourth-level category | | level4_category_name | String | Name of the fourth-level category | | level5_category_id | String | Unique ID of the fifth-level category | | level5_category_name | String | Name of the fifth-level category | | level6_category_id | String | Unique ID of the sixth-level category | | level6_category_name | String | Name of the sixth-level category | --- --- sidebar_position: 2 title: Hacker News description: Sample data from Hacker News stories to use for SQL querying of DuckDB and MotherDuck databases. --- ## About the dataset [Hacker News](https://news.ycombinator.com/) is a social news website focusing on computer science and entrepreneurship. It is run by Y Combinator, a startup accelerator, and it's known for its minimalist interface. Users can post stories (such as links to articles), comment on them, and vote them up or down, affecting their visibility. There are two ways to access the dataset : - Through the `sample_data` database, which contains a sample of the data (from **January 2022** to **November 2022**) - Through the `hacker_news` database, which contains the full dataset (from **2016** to **2025**) To attach the `sample_data` database, you can use the following command: ```sql ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data; ``` To attach the `hacker_news` database, you can use the following command: ```sql ATTACH 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' AS hacker_news; ``` ## Schema | column_name | column_type | null | key | default | extra | |-------------|-------------|------|-----|---------|-------| | title | VARCHAR | YES | | | | | url | VARCHAR | YES | | | | | text | VARCHAR | YES | | | | | dead | BOOLEAN | YES | | | | | by | VARCHAR | YES | | | | | score | BIGINT | YES | | | | | time | BIGINT | YES | | | | | timestamp | TIMESTAMP | YES | | | | | type | VARCHAR | YES | | | | | id | BIGINT | YES | | | | | parent | BIGINT | YES | | | | | descendants | BIGINT | YES | | | | | ranking | BIGINT | YES | | | | | deleted | BOOLEAN | YES | | | | To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### Most shared websites This query returns the top domains being shared on Hacker News. ```sql SELECT regexp_extract(url, 'http[s]?://([^/]+)/', 1) AS domain, count(*) AS count FROM sample_data.hn.hacker_news WHERE url IS NOT NULL AND regexp_extract(url, 'http[s]?://([^/]+)/', 1) != '' GROUP BY domain ORDER BY count DESC LIMIT 20; ``` ### Most Commented Stories Each Month This query calculates the total number of comments for each story and identifies the most commented story of each month. ```sql WITH ranked_stories AS ( SELECT title, 'https://news.ycombinator.com/item?id=' || id AS hn_url, descendants AS nb_comments, YEAR(timestamp) AS year, MONTH(timestamp) AS month, ROW_NUMBER() OVER ( PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY descendants DESC ) AS rn FROM sample_data.hn.hacker_news WHERE type = 'story' ) SELECT year, month, title, hn_url, nb_comments FROM ranked_stories WHERE rn = 1 ORDER BY year, month; ``` ### Most monthly voted stories This query determines the most voted story for each month. ```sql WITH ranked_stories AS ( SELECT title, 'https://news.ycombinator.com/item?id=' || id AS hn_url, score, YEAR(timestamp) AS year, MONTH(timestamp) AS month, ROW_NUMBER() OVER (PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY score DESC) AS rn FROM sample_data.hn.hacker_news WHERE type = 'story' ) SELECT year, month, title, hn_url, score FROM ranked_stories WHERE rn = 1 ORDER BY year, month; ``` ### Keyword analysis This query counts the monthly mentions a the keyword (here `duckdb`) in the title or text of Hacker News posts, organized by year and month. ```sql SELECT YEAR(timestamp) AS year, MONTH(timestamp) AS month, COUNT(*) AS keyword_mentions FROM sample_data.hn.hacker_news WHERE (title LIKE '%duckdb%' OR text LIKE '%duckdb%') GROUP BY year, month ORDER BY year ASC, month ASC; ``` --- --- sidebar_position: 4 title: NYC 311 Complaint Data description: New York City provides data from 311 call service requests. This data can be used as sample data for DuckDB and MotherDuck SQL queries. --- ## About the dataset The [New York City 311 Service Requests Data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9) provides information on requests to the city's complaint service from 2010 to the present. NYC311 responds to thousands of inquiries, comments and requests from customers every single day. This dataset represents only service requests that can be directed to specific agencies. This dataset is updated daily and expected values for many fields will change over time. The lists of expected values associated with each column are not exhaustive. Each row of data contains information about the service request, including complaint type, responding agency, and geographic location. However the data does not reveal any personally identifying information about the customer who made the request. This dataset describes site-specific non-emergency complaints (also known as “service requests”) made by customers across New York City about a variety of topics, including noise, sanitation, and street quality. The columns have been renamed to `lower_case_underscore` format for ease of typing. For more details on column data than below, see the associated data dictionary at that link above, in an Excel file. | column_name | column_type | null | description | |--------------------------------|---------------|--------|-------------| | unique_key | BIGINT | YES | Unique identifier of a Service Request (SR) in the open data set. Each 311 service request is assigned a number that distinguishes it as a separate case incident. | | created_date | TIMESTAMP | YES | The date and time that a Customer submits a Service Request. | | closed_date | TIMESTAMP | YES | The date and time that an Agency closes a Service Request. | | agency | VARCHAR | YES | Acronym of responding City Government Agency or entity responding to 311 Service Request. | | agency_name | VARCHAR | YES | Full agency name of responding City Government Agency, or entity responding to 311 service request. | | complaint_type | VARCHAR | YES | This is the first level of a hierarchy identifying the topic of the incident or condition. Complaint Type broadly describes the topic of the incident or condition and are defined by the responding agencies. | | descriptor | VARCHAR | YES | This is associated to the Complaint Type, and provides further detail on the incident or condition. Descriptor values are dependent on the Complaint Type, and are not always required in the service request. | | location_type | VARCHAR | YES | Describes the type of location used in the address information | | incident_zip | VARCHAR | YES | Zip code of the incident address | | incident_address | VARCHAR | YES | House number and street name of incident address | | street_name | VARCHAR | YES | Street name of incident address | | cross_street_1 | VARCHAR | YES | First Cross street based on the geo validated incident location.| | cross_street_2 | VARCHAR | YES | Second Cross Street based on the geo validated incident location | | intersection_street_1 | VARCHAR | YES | First intersecting street based on geo validated incident location | | intersection_street_2 | VARCHAR | YES | Second intersecting street based on geo validated incident location | | address_type | VARCHAR | YES | Type of information available about the incident location: Address; Block face; Intersection; LatLong; Placename | | city | VARCHAR | YES | In this dataset, City can refer to a borough or neighborhood. MANHATTAN, BROOKLYN, BRONX, STATEN ISLAND, or in QUEENS, specifc neighborhood name | | landmark | VARCHAR | YES | If the incident location is identified as a Landmark the name of the landmark will display here. Can refer to any noteworthy location, including but not limited to, parks, hospitals, airports, sports facilities, performance spaces, etc. | | facility_type | VARCHAR | YES | If applicable, this field describes the type of city facility associated to the service request: DSNY Garage, Precinct, School, School District, N/A | | status | VARCHAR | YES | Current status of the service request submitted: Assigned, Canceled, Closed, Pending | | due_date | TIMESTAMP | YES | Date when responding agency is expected to update the SR. This is based on the Complaint Type and internal Service Level Agreements (SLAs) | | resolution_description | VARCHAR | YES | Describes the last action taken on the service request by the responding agency. May describe next or future steps. | | resolution_action_updated_date | TIMESTAMP | YES | Date when responding agency last updated the service request. | | bbl | VARCHAR | YES | Parcel number that identifies the location of the building or property associated with the service request. The block is a subset of a borough. The lot is a subset of a block unique within a borough and block. | | borough | VARCHAR | YES | The borough number is: 1. Manhattan (New York County) 2. Bronx (Bronx County) 3. Brooklyn (Kings County) 4. Queens (Queens County) 5. Staten Island (Richmond County) | | x_coordinate_state_plane | VARCHAR | YES | Geo validated, X coordinate of the incident location. X coordinate of the incident location. For more information about NY State Plane Coordinate Zones: http://gis.ny.gov/gisdata/metadata/nysogs.statepln.html?nysgis= | | y_coordinate_state_plane | VARCHAR | YES | Geo validated, Y coordinate of the incident location. Y coordinate of the incident location. For more information about NY State Plane Coordinate Zones: http://gis.ny.gov/gisdata/metadata/nysogs.statepln.html?nysgis= | | open_data_channel_type | VARCHAR | YES | Indicates how the service request was submitted to 311: Phone, Online, Other (submitted by other agency) | | park_facility_name | VARCHAR | YES | If the incident location is a Parks Dept facility and service requests pertains to a facility managed by NYC Parks (DPR), the name of the facility will appear here | | park_borough | VARCHAR | YES | The borough of incident if the service request is pertaining to a NYC Parks Dept facility (DPR) | | vehicle_type | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commision (TLC): Ambulette / Paratransit; Car Service; Commuter Van; Green Taxi | | taxi_company_borough | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commision (TLC). | | taxi_pick_up_location | VARCHAR | YES | If the incident pertains a vehicle managed by the Taxi and Limousine Commision (TLC), this field displays the taxi pick up location | | bridge_highway_name | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the name will be displayed here | | bridge_highway_direction | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the direction where the issue took place would be displayed here. | | road_ramp | VARCHAR | YES | If the incident location was Bridge/Highway this column differentiates if the issue was on the Road or the Ramp. | | bridge_highway_segment | VARCHAR | YES | Additional information on the section of the Bridge/Highway were the incident took place. | | latitude | DOUBLE | YES | Geo based Latitude of the incident location in decimal degrees | | longitude | DOUBLE | YES | Geo based Longitude of the incident location in decimal degrees | | community_board | VARCHAR | YES | Community boards are local representative bodies. There are 59 community boards throughout the City, each representing a distinct geography. For more information on Community Boards: [NYC government website](https://www.nyc.gov/site/cau/community-boards/community-boards.page) | To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### The most common complaints in 2018 ```sql SELECT UPPER(complaint_type), COUNT(1) FROM sample_data.nyc.service_requests WHERE DATE_PART('year', created_date) = 2018 GROUP BY 1 HAVING COUNT(*) > 1000 ORDER BY 2 DESC; ``` --- --- sidebar_position: 5 title: PyPi Data description: Want to know how users find and install software you've developed for the Python Community? This DuckDB and MotherDuck database allows you to use SQL to perform data analysis on PyPi data. --- ## About the dataset PyPi is the Python Package Index, a repository of software packages for the Python programming language. It is a central repository that allows users to find and install software developed and shared by the Python community. The dataset includes information about packages, releases, and downloads on the `duckdb` python package. It's refreshed **weekly** and you can visit the dashboard [here](https://duckdbstats.com) ## How to query the dataset A dedicated shared database is maintained to query the dataset. To attach it to your workspace, you can use the following command: ```sql ATTACH 'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410' AS duckdb_stats; ``` ## Schema ### pypi_file_downloads This table contains the raw data. Each row represents a download from PyPi. | column_name | column_type | null | |--------------|----------------------------------------------------------------------------------------------------------------|------| | timestamp | TIMESTAMP | YES | | country_code | VARCHAR | YES | | url | VARCHAR | YES | | project | VARCHAR | YES | | file | STRUCT(filename VARCHAR, project VARCHAR, "version" VARCHAR, "type" VARCHAR) | YES | | details | STRUCT("installer" STRUCT("name" VARCHAR, "version" VARCHAR), "python" VARCHAR, "implementation" STRUCT("name" VARCHAR, "version" VARCHAR), "distro" STRUCT("name" VARCHAR, "version" VARCHAR, "id" VARCHAR, "libc" STRUCT("lib" VARCHAR, "version" VARCHAR)), "system" STRUCT("name" VARCHAR, "release" VARCHAR), "cpu" VARCHAR, "openssl_version" VARCHAR, "setuptools_version" VARCHAR, "rustc_version" VARCHAR, "ci" BOOLEAN) | YES | | tls_protocol | VARCHAR | YES | | tls_cipher | VARCHAR | YES | ### pypi_daily_stats This table is a daily aggregation of the raw data. It contains the following columns: | column_name | column_type | null | |-------------------|-------------|------| | load_id | VARCHAR | YES | | download_date | DATE | YES | | system_name | VARCHAR | YES | | system_release | VARCHAR | YES | | version | VARCHAR | YES | | project | VARCHAR | YES | | country_code | VARCHAR | YES | | cpu | VARCHAR | YES | | python_version | VARCHAR | YES | | daily_download_sum| BIGINT | YES | ## Examples queries The following queries assume that the current database connected is `duckdb_stats`. Run `use duckdb_stats` to switch to it. ### Get weekly download stats ```sql SELECT DATE_TRUNC('week', download_date) AS week_start_date, version, country_code, python_version, SUM(daily_download_sum) AS weekly_download_sum FROM duckdb_stats.main.pypi_daily_stats GROUP BY ALL ORDER BY week_start_date ``` --- --- sidebar_position: 5 title: StackOverflow Survey Data description: Data from the StackOverflow Developer Survey from 2017 to 2024. --- ## About the dataset Each year, [Stack Overflow conducts a survey](https://survey.stackoverflow.co/) of developers to understand the trends in the developer community. The survey covers a wide range of topics, including programming languages, frameworks, databases, and platforms, as well as developer demographics, education, and career satisfaction. Starting from 2017, StackOverflow provided consistent schema and data format for the survey data, making it a great dataset to analyze trends in the developer community over the years. The source is data are a series of CSV files that has been merged into a single schema with two tables for easy querying. ## How to query the dataset This dataset is available as part of the `sample_data` database. This database is auto attached to any new user's workspace. To re-attach the database, you can use the following command: ```sql ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data; ``` ## Schema ### stackoverflow_survey.survey_results This table contains all the survey results from 2017 to 2024. Each columns represents a question from the survey. As questions change from year to year, the columns may vary a bit and the table is quite large. ### stackoverflow_survey.survey_schema This table contains the schema of the survey results. `qname` is the name of the question which is also the column name in the `survey_results` table. `question` is the full question text. | Column Name | Column Type | |---------------|-------------| | qname | VARCHAR | | question | VARCHAR | | qid | VARCHAR | | force_resp | VARCHAR | | type | VARCHAR | | selector | VARCHAR | | year | VARCHAR | ## Examples_queries ### List the most popular programming languages in 2024 ```sql SELECT language, COUNT(*) AS count FROM ( SELECT UNNEST(STRING_SPLIT(LanguageHaveWorkedWith, ';')) AS language FROM sample_data.stackoverflow_survey.survey_results where year='2024' ) AS languages GROUP BY language ORDER BY count DESC; ``` ### Top 10 Countries with the Most Respondents in 2024 ```sql SELECT Country, COUNT(*) AS Respondents FROM sample_data.stackoverflow_survey.survey_results WHERE year = '2024' GROUP BY Country ORDER BY Respondents DESC LIMIT 10; ``` ## Correlation Between Remote Work and Job Satisfaction in 2024 ```sql SELECT RemoteWork, AVG(CAST(JobSat AS DOUBLE)) AS AvgJobSatisfaction, COUNT(*) AS RespondentCount FROM sample_data.stackoverflow_survey.survey_results WHERE JobSat NOT IN ('NA', 'Slightly satisfied', 'Neither satisfied nor dissatisfied', 'Very dissatisfied', 'Very satisfied', 'Slightly dissatisfied') AND RemoteWork NOT IN ('NA') AND YEAR='2024' GROUP BY ALL ``` --- --- sidebar_position: 5 title: StackOverflow Data description: Sample data from StackOverflow to use with DuckDB and MotherDuck to understand SQL-based data analytics. --- ## About the dataset [Stack Overflow](https://stackoverflow.com/) is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions. The dataset includes a complete **data dump up to May 2023**, covering posts, comments, users, badges, and related metrics. You can read more about the dataset in our blog series [part 1](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-1/) and [part 2](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-2/). ## How to query the dataset As this dataset is quite large, it's not part of the `sample_data` database. Instead, you can find it as a dedicated shared database. To attach it to your workspace, you can use the following command: ```sql ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow; ``` ## Schema ### Badges | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | UserId | BIGINT | YES | | | | | Name | VARCHAR | YES | | | | | Date | TIMESTAMP | YES | | | | | Class | BIGINT | YES | | | | | TagBased | BOOLEAN | YES | | | | ### Comments | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostId | BIGINT | YES | | | | | Score | BIGINT | YES | | | | | Text | VARCHAR | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | UserId | BIGINT | YES | | | | | ContentLicense | VARCHAR | YES | | | | ### Post Links | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | PostId | BIGINT | YES | | | | | RelatedPostId | BIGINT | YES | | | | | LinkTypeId | BIGINT | YES | | | | ### Posts | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostTypeId | BIGINT | YES | | | | | AcceptedAnswerId | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | Score | BIGINT | YES | | | | | ViewCount | BIGINT | YES | | | | | Body | VARCHAR | YES | | | | | OwnerUserId | BIGINT | YES | | | | | LastEditorUserId | BIGINT | YES | | | | | LastEditorDisplayName | VARCHAR | YES | | | | | LastEditDate | TIMESTAMP | YES | | | | | LastActivityDate | TIMESTAMP | YES | | | | | Title | VARCHAR | YES | | | | | Tags | VARCHAR | YES | | | | | AnswerCount | BIGINT | YES | | | | | CommentCount | BIGINT | YES | | | | | FavoriteCount | BIGINT | YES | | | | | CommunityOwnedDate | TIMESTAMP | YES | | | | | ContentLicense | VARCHAR | YES | | | | ### Tags | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | TagName | VARCHAR | YES | | | | | Count | BIGINT | YES | | | | | ExcerptPostId | BIGINT | YES | | | | | WikiPostId | BIGINT | YES | | | | ### Votes | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostId | BIGINT | YES | | | | | VoteTypeId | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | ### Users | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | Reputation | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | DisplayName | VARCHAR | YES | | | | | LastAccessDate | TIMESTAMP | YES | | | | | AboutMe | VARCHAR | YES | | | | | Views | BIGINT | YES | | | | | UpVotes | BIGINT | YES | | | | | DownVotes | BIGINT | YES | | | | ## Examples queries The following queries assume that the current database connected is `stackoverflow`. Run `use stackoverflow` to switch to it. ### List the top 5 posts that received the most votes ```sql SELECT posts.Title, COUNT(votes.Id) AS VoteCount FROM posts JOIN votes ON posts.Id = votes.PostId GROUP BY posts.Title ORDER BY VoteCount DESC LIMIT 5; ``` ### Find the top 5 posts with the highest view count: ```sql SELECT Title, ViewCount FROM posts ORDER BY ViewCount DESC LIMIT 5; ``` --- --- sidebar_position: 3 title: Evidence --- import BlockWithBacktick from '@site/src/components/BlockWithBacktick'; [Evidence](https://evidence.dev/) is an open source, code-based alternative to drag-and-drop BI tools. Build polished data products with just SQL and markdown. ## Getting started Head over to [their installation page](https://docs.evidence.dev/getting-started/install-evidence) and start with their template to get you started. ## Authenticate to MotherDuck When using development, you can go manually through the UI, pick "settings". If you are running Evidence locally, typically at [http://localhost:3000/settings](http://localhost:3000/settings). ![img](../img/evidence_settings.png) Then select 'DuckDB' as a connection type, and as the filename, use `'md:?motherduck_token=xxxx'` where `xxx` is your [access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). Finally as extension, select "No extension". Click on `Save`. ![img](../img/evidence_duckdb.png) In production, you can set [some global environmements](https://docs.evidence.dev/deployment/environments#prod-environment), you would have to set two environments variables: - `EVIDENCE_DUCKDB_FILENAME='md:?motherduck_token=xxxx'` - `EVIDENCE_DATABASE=duckdb` ## Displaying some data through SQL and Markdown Once done, you can add a new page in the `pages` folder and add the following code blocks to `stackoverflow.md` file: First, we simply add some Markdown headers. ```md --- title: Evidence & MotherDuck --- # Stories with most score ``` Then, we query our data from the [HackerNews sample_data database](/getting-started/sample-data-queries/hacker-news.md) in MotherDuck. The query is fetching the top stories (posts) from HackerNews. SELECT id, title, score, "by", strftime('%Y-%m-%d', to_timestamp(time)) AS date FROM sample_data.hn.hacker_news WHERE type = 'story' ORDER BY score DESC LIMIT 20; Finally, we use the reference of that query result `new_items` to create a list that would be generated in Mardown. The list contains the title (with the url of the story), the date, the score and the author of the story. ```md {#each new_items as item} * [{item.title}](https://news.ycombinator.com/item?id={item.id}) {item.date} ⬆ {item.score} by [{item.by}](https://news.ycombinator.com/user?id={item.by}) {/each} ``` Head over then to this page you created and you should see the final result that looks like this: ![img](../img/evidence_hackernews.png) --- --- sidebar_position: 1 title: Hex --- import Image from '@theme/IdealImage'; [Hex](https://hex.tech/) is a software platform for collaborative data science and analytics using Python, SQL and no-code. You have two ways to connect to MotherDuck using Hex: - **Using SQL cells with a data connection**: MotherDuck is a supported [data connection in Hex](https://learn.hex.tech/docs/connect-to-data/data-connections/data-connections-introduction#supported-data-sources). - **Using Python cells**: You can use Python cells to connect to MotherDuck and query data using DuckDB. ## Using SQL cells with a data connection To add a new data connection, head over the Data browser in a new notebook and click on `Add data donnection`. Select `MotherDuck` as the data source and fill in the required fields. The most important is the MotherDuck token, which you can find in the [MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). Once done, you can use the data browser to explore the tables and columns and directly specify your data connection in your SQL cell. ![hex_data_browser](../img/hex_data_browser_2.png) ![hex_sql_cell](../img/hex_sql_cell.png) ### Query some data Add another cell and run the same query we ran in a Python cell : ```sql SELECT dayname(tpep_pickup_datetime) AS day_of_week, strftime('%H', tpep_pickup_datetime) AS hour_of_day, COUNT(*) AS trip_count FROM sample_data.nyc.taxi GROUP BY day_of_week, hour_of_day ORDER BY day_of_week, hour_of_day; ``` This produces both a table and a Dataframe, which you can utilize in the same manner as we previously demonstrated with Python to generate data visualizations. ![hex_sql_result](../img/hex_sql_result.png) ## Using Python cells If you prefer programming in Python, you can use Python cells to connect to MotherDuck and start query data. You can jump directly on the [Hex notebook](https://app.hex.tech/c0083b53-a04f-47b1-bff7-a9ff12590a9f/hex/5c85b3e2-3df7-4011-87a0-1fff63787d03/draft/logic) for a quickstart. The notebook highlight how you can query data using Python or SQL cells and display charts! ### Storing your MotherDuck token The first step is to safely store your MotherDuck token. You can do this by [creating a new secret in Hex.](https://learn.hex.tech/docs/environment-configuration/environment-views#secrets) ![Hex secrets](../img/hex_secrets.png) Let's add your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md#authentication-using-an-access-token) under the name `motherduck_token`. ![Hex secrets2](../img/hex_secrets_2.png) Once done, add the next Python cell to export as environment variable your `motherduck_token`. This will be detected by SQL/Python processes when authenticating to MotherDuck. ```python # Passing the secrets as environment variable for Python/SQL cell auth # Fill in your token as a Hex project secret https://learn.hex.tech/docs/environment-configuration/environment-views#secret import os os.environ["motherduck_token"] = motherduck_token ``` ### Connecting to MotherDuck Connecting to MotherDuck is straightforward as DuckDB is already pre-installed in the Hex environment! Add a Python cell and run the following code: ![Hex add cell](../img/hex_add_cell.png) ```python import duckdb # Connect to MotherDuck using Python conn = duckdb.connect(f'md:') ``` ### Query some data and display a chart We can now easily query some data based on the [sample_data database](/getting-started/sample-data-queries/datasets.mdx). We will run a simple query and return it as a pandas dataframe in order to display it as a chart. This database is auto-attached to any MotherDuck user, so you can query it directly. Add another Python cell and run the following code: ```python # Query sample_data database and convert it to a pandas dataframe for dataviz peak_hours = conn.sql(""" SELECT dayname(tpep_pickup_datetime) AS day_of_week, strftime('%H', tpep_pickup_datetime) AS hour_of_day, COUNT(*) AS trip_count FROM sample_data.nyc.taxi GROUP BY day_of_week, hour_of_day ORDER BY day_of_week, hour_of_day;""").to_df() ``` Now we can display the chart using the Visualization cell. Add a new Visualization cell, type `Chart` and select the dataframe we just created `peak_hours`. ![Hex chart](../img/hex_chart_df.png) Finally, play with the parameters to obtain the following chart which gives you a weekly view of the peak hours in New York City for the yellow cabs. ![Hex chart peak hours](../img/hex_chart_peak_hours.png) --- --- title: Business Intelligence Tools description: Use MotherDuck as a data source in tools for interactive data analysis and presentation --- import DocCardList from '@theme/DocCardList'; # Business Intelligence Tools MotherDuck integrates with popular business intelligence tools to help you analyze and visualize your data. --- --- sidebar_position: 7 title: Marimo --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # marimo [marimo](https://marimo.io/) is a reactive notebook for Python and SQL that models notebooks as dataflow graphs. When you run a cell or interact with a UI element, marimo automatically runs affected cells (or marks them as stale), keeping code and outputs consistent and preventing bugs before they happen. Every marimo notebook is stored as pure Python, executable as a script, and deployable as an app. ## Getting Started ### Installation First, install marimo with SQL support: ```bash pip install "marimo[sql]" ``` ```bash uv pip install "marimo[sql]" ``` ```bash conda install -c conda-forge marimo duckdb polars ``` ### Authentication There are two ways to authenticate: 1. **Interactive Authentication**: When you first connect to MotherDuck (e.g. `ATTACH 'md:my_db'`), marimo will open a browser window for authentication. 2. **Token-based Authentication**: Set your MotherDuck token as an environment variable: ```bash export motherduck_token="your_token" ``` You can find your token in the MotherDuck UI under Account Settings. ## Using MotherDuck First, open your first notebook: ```bash marimo edit my_notebook.py ``` ### 1. Connecting and Database Discovery ```sql ATTACH IF NOT EXISTS 'md:my_db' ``` ```python import duckdb # Connect to MotherDuck duckdb.sql("ATTACH IF NOT EXISTS 'md:my_db'") ``` You will be prompted to authenticate with MotherDuck when you run the above cell. This will open a browser window where you can log in and authorize your marimo notebook to access your MotherDuck database. In order to avoid being prompted each time you open a notebook, you can set the `motherduck_token` environment variable: ```bash export motherduck_token="your_token" marimo edit my_notebook.py ``` Once connected, your MotherDuck tables are automatically discovered in the Datasources Panel:
![img](../img/marimo_motherduck_db_discovery.png)
Browse your MotherDuck databases
### 2. Writing SQL Queries You can query your MotherDuck db using SQL cells in marimo. Here's an example of how to query a table and display the results using marimo:
![img](../img/marimo_motherduck_sql.png)
Query a MotherDuck table
marimo's reactive execution model extends into SQL queries, so changes to your SQL will automatically trigger downstream computations for dependent cells (or optionally mark cells as stale for expensive computations). ![img](../img/marimo_motherduck_reactivity-ezgif.com-speed.gif) ### 3. Mixing SQL and Python marimo allows you to seamlessly combine SQL queries with Python code:
![img](../img/marimo_motherduck_python_and_sql.png)
Mixing SQL and Python
## Example Notebook For a full example of using MotherDuck with marimo, check out this [example notebook](https://github.com/marimo-team/marimo/blob/main/examples/sql/connect_to_motherduck.py). --- --- sidebar_position: 5 title: Metabase --- [Metabase](https://www.metabase.com/) is an open source analytics/BI platform that provides intuitive data visualization and exploration capabilities. This guide details how to connect Metabase to both local DuckDB databases and MotherDuck. ## Prerequisites - Metabase installed (self-hosted) - Admin access to your Metabase instance - For MotherDuck connections: valid MotherDuck token ## Installing the DuckDB/MotherDuck Driver ### Self-hosted Metabase 1. Download the [latest driver release](https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases) 2. Copy the downloaded `.jar` file into your Metabase plugins directory: - Standard installation: If your `metabase.jar` is located at `~/app/metabase.jar`, place the driver in `~/app/plugins/` - Mac App: The plugins directory is `~/Library/Application Support/Metabase/Plugins/` 3. Restart your Metabase instance for the new driver to be detected ### Metabase Cloud **Coming soon!** Support for Metabase Cloud is under development and coming soon. ## Connecting to DuckDB/MotherDuck After installing the driver, you can add DuckDB or MotherDuck as a data source in Metabase. 1. Log in to Metabase with admin credentials 2. Navigate to **Admin Settings** > **Databases** > **Add Database** 3. Select **DuckDB** as the database type :::note Since DuckDB does not do implicit casting by default, the `old_implicit_casting` config is currently necessary for datetime filtering in Metabase to function. It's recommended to keep it set. ::: ### Connecting to MotherDuck To connect to MotherDuck: 1. **Database name**: Enter `md:[database_name]` where `[database_name]` is your MotherDuck database name 2. **MotherDuck token**: Paste your MotherDuck token (retrieve from the [MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md)) 3. **Configuration**: Enable `old_implicit_casting` (recommended) for proper datetime handling ![Example](../img/metabase_motherduck.png) ## Configuration Best Practices - **Connection pooling**: For production instances, set an appropriate connection pool size based on expected concurrent users - **Query timeouts**: Configure timeouts in Metabase settings to prevent long-running queries from affecting system performance - **Data access**: Use database-level permissions in Metabase to control who can access which data sources ## Troubleshooting | Issue | Solution | |-------|----------| | Driver not detected | Ensure driver is in the correct plugins directory and Metabase has been restarted | | Connection failures | Verify database path (local) or database name and token (MotherDuck) | | Permission errors | Check file permissions for local databases | | Datetime filtering issues | Enable `old_implicit_casting` in the connection settings | | Add MotherDuck token in the connection string | Specify a correct MotherDuck token or MotherDuck database name after the `md:` prefix | ### Connecting to a Local DuckDB database To connect to a local DuckDB database: 1. **Database file**: Enter the full path to your DuckDB file (e.g., `/path/to/database.db`) 2. **Configuration**: Enable `old_implicit_casting` (recommended) to ensure proper datetime filtering 3. **Additional settings**: - **Read only**: Toggle as appropriate for your use case - **Naming strategy**: Choose your preferred table/field naming strategy :::note DuckDB's concurrency model supports either one process with read/write permissions, or multiple processes with read permissions, but not both at the same time. This means you will not be able to open a local DuckDB in read-only mode, then the same DuckDB in read-write mode in a different process. ::: ![Example](../img/metabase_local_duckdb.png) --- --- sidebar_position: 6 sidebar_label: Microsoft Power BI title: Power BI with DuckDB and MotherDuck --- [Power BI](https://www.microsoft.com/en-us/power-platform/products/power-bi) is an interactive data visualization product developed by Microsoft. MotherDuck has built an open-source [DuckDB Power Query Connector](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector) that you can use to connecto Power BI to DuckDB and MotherDuck. ## Installing 1. Download the latest DuckDB ODBC driver from the [DuckDB Power Query Connector GitHub Releases](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/releases) for Windows: - [duckdb_odbc-windows-amd64.zip](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/releases/latest/download/duckdb_odbc-windows-amd64.zip) 1. Extract the `.zip` archive into a permanent location, such as `C:\Program Files\duckdb_odbc`. Run `odbc_install.exe` - if Windows displays a security warning, click "More information" then "Run Anyway". 2. Optionally, verify the installation in the Registry Editor: - Open Registry Editor by running `regedit` - Navigate to `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB` - Confirm the Driver field shows your installed version - If incorrect, delete the `DuckDB` registry key and reinstall 3. Configure Power BI security settings to allow loading of custom extensions: - Go to File -> Options and settings -> Options -> Security -> Data Extensions - Enable "Allow any extensions to load without validation or warning" - ![Dialog window showing Power BI Options -> Security -> Data Extensions](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/raw/main/images/power_bi_options.png) 1. Download the latest version of the DuckDB Power Query extension: - [duckdb-power-query-connector.mez](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/releases/latest/download/duckdb-power-query-connector.mez) 2. Create the Custom Connectors directory if it does not yet exist: - Navigate to `[Documents]\Power BI Desktop\Custom Connectors` - Create this folder, if it doesn't exist - Note: If this location does not work you may need to place this in your OneDrive Documents folder instead 3. Copy the `duckdb-power-query-connector.mez` file into the Custom Connectors folder 4. Restart Power BI Desktop ## How to use with Power BI 1. In Power BI Desktop, click "Get Data" -> "More..." 2. Search for "DuckDB" in the connector search box and select the DuckDB connector ![Find DuckDB connector](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/raw/main/images/find-connector.png) 3. For MotherDuck connections, you'll need to provide: - Database Location: Use the `md:` prefix followed by your database name (e.g., `md:my_database`). This can also be a local file path (e.g., `~\my_database.db`) - MotherDuck Token: Get your token from [MotherDuck's token page](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token) - Read Only (Optional): Set to `true` if you only need read access ![Connect to your MotherDuck database](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/raw/main/images/connect-duckdb.png) 4. Click "OK". 5. Click "Connect". ![Connect dialog](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/raw/main/images/connect.png) 6. Select the table(s) you want to import. Click "Load". ![Navigator dialog to preview and select your table(s)](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/raw/main/images/navigator.png) 7. You can now query your data and create visualizations! ![Power BI example usage](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/raw/main/images/power-bi-example.png) 8. After connecting, you can: - Browse and select tables from your MotherDuck or DuckDB database - Use "Transform Data" to modify your queries before loading - Write custom SQL queries using the "Advanced Editor" - Import multiple tables in one go 9. Power BI will maintain the connection to your MotherDuck or DuckDB database, allowing you to: - Refresh data automatically or on-demand - Create relationships between tables - Build visualizations and dashboards - Share reports with other users (requires proper gateway setup) ## Use custom data connectors with an on-premises data gateway You can use custom data connectors with an on-premises data gateway to connect to data sources that are not supported by default. To do this, you need to install the on-premises data gateway and configure it to use the custom data connector. For more information, see [Use custom data connectors with an on-premises data gateway in Power BI](https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-custom-connectors). It should be noted that there are some limitations with using a custom connector with an on-premise data gateway: - Make sure the folder you create is accessible to the background gateway service. Typically, folders under your users' Windows folders or system folders aren't accessible. The on-premises data gateway app shows a message if the folder isn't accessible. This limitation doesn't apply to the on-premises data gateway (personal mode). - If your custom connector is on a network drive, include the fully qualified path in the on-premises data gateway app. - You can only use one custom connector data source when working in DirectQuery mode. Multiple custom connector data sources don't work with DirectQuery. ## Additional information - [Power BI Documentation](https://learn.microsoft.com/en-us/power-bi/connect-data/) - [DuckDB Power Query Connector](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector) ## Troubleshooting ### Missing VCRUNTIME140.dll If you receive an error about missing `VCRUNTIME140.dll`, you need to install the Microsoft Visual C++ Redistributable. You can download it from [Microsoft's download page](https://www.microsoft.com/en-us/download/details.aspx?id=52685). ### Visual C++ and ODBC Issues :::note These steps are particularly relevant for Windows Server environments, especially for Windows Server 2019, but may also help resolve issues on other Windows versions. ::: If you encounter issues with ODBC connectivity or receive errors related to Visual C++ libraries, try these troubleshooting steps: 1. Reinstall the Microsoft Visual C++ Redistributable: - Download the latest version from [Microsoft's official website](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170) for your architecture - Run the installer with administrator privileges - Restart your computer after installation - Try connecting to MotherDuck again 2. If you're still experiencing issues, you can use the ODBC Test tool to diagnose the connection: - Open the ODBC Test tool (typically available in Windows SDK) - Look for a dropdown menu labeled "hstmt 1: ..." - Select this option to run test queries - If queries work in the ODBC Test tool but not in Power BI, this indicates a Power BI-specific configuration issue If you continue to experience problems after trying these steps, please: - Verify that your MotherDuck token is valid and hasn't expired - Check that your network allows connections to MotherDuck's services - Ensure you have the latest version of the DuckDB Power Query Connector installed If you're still experiencing issues, please reach out to us at [support@motherduck.com](mailto:support@motherduck.com) and we'll be happy to help you troubleshoot the issue. --- --- sidebar_position: 4 title: Superset & Preset --- [Apache Superset](https://superset.apache.org/) is a powerful, open-source data exploration and visualization platform designed to be intuitive and interactive. It allows data professionals to quickly integrate and analyze data from various sources, creating insightful dashboards and charts for better decision making. [Preset](https://preset.io/) is a cloud-native, user-friendly platform built on Apache Superset. It offers enhanced capabilities and managed services to leverage the power of Superset without needing to handle installation and maintenance. In this guide, we'll cover how you can use MotherDuck with either Superset or Preset. ## Superset ### Setup The easy way to get started locally with Superset is to use their [docker-compose configurations.](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/) ### Adding a database connection to MotherDuck To make it work with DuckDB & MotherDuck, you will have to install an extra Python package, the DuckDB SQLAlchemy driver [duckdb-engine](https://github.com/Mause/duckdb_engine). You can follow the steps [here](https://github.com/apache/superset/tree/master/docker#local-packages) to install additional packages before launching the `docker-compose`. Once done, you can now add the database connection. 1. Head over to "Settings" and click on "Database Connections" ![setting](../img/superset_1.png) 2. Click on "+ Database" ![adddb](../img/superset_2.png) 1. In the Dropdown, pick "DuckDB" ![addduckdb1](../img/superset_3.png) :::note If DuckDB isn't listed, there's probably an error in the installation of the `duckdb-engine`. Review the installation steps to install this extra python package. ::: 4. Enter the SQLAlchemy URI to MotherDuck that follows this pattern : ```bash duckdb:///md:?motherduck_token= ``` ![sqlalchemy](../img/superset_4.png) :::info Database name is **optional**, so you can have one connection to MotherDuck and query multiple databases. ::: Finally, you can test your token/connection is valid by clicking "Test connection" and click "Connect". Now your MotherDuck database is available in Superset and you can start making some dashboards! ## Preset ### Setup You can register a Preset account for [free](https://preset.io/pricing/) (up to 5 users). Upon your account creation, you will need to create a workspace and be prompt to connect to your data source. ### Adding a database connection to MotherDuck In Preset, you don't need to do any extra work as the DuckDB SQLAlchemy driver is already installed and ready for you to select. Following the exact same steps than with Superset: 1. Add a database connection by going to "Settings", then "Database Connections" ![settingspreset](../img/preset_1.png) 2. Click on "+ Database" 3. In the Dropdown, pick "DuckDB" ![addduckdb2](../img/preset_3.png) 4. Enter the SQLAlchemy URI to MotherDuck that follows this pattern : ```bash duckdb:///md:?motherduck_token= ``` ![sqlalchemy2](../img/preset_4.png) Finally, you can test your token/connection is valid by clicking "Test connection" and click "Connect". Now your MotherDuck database is available in Preset and you can start making some dashboards! :::info Database name is **optional**, so you can have one connection to MotherDuck and query multiple databases. ::: --- --- sidebar_position: 5 title: Tableau --- [Tableau](https://www.tableau.com/) is an analytics/BI platform that can be used as a standalone tool (Tableau Desktop) or as a hosted analytics platform (Tableau Server). ## Tableau DuckDB/MotherDuck Setup 1. Download a [recent version of the DuckDB JDBC driver](https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/) and copy it into the Tableau Drivers directory: * MacOS: `~/Library/Tableau/Drivers/` * Windows: `C:\Program Files\Tableau\Drivers` * Linux: `/opt/tableau/tableau_driver/jdbc` 2. Download the signed tableau connector (aka "Taco file") file from the [latest available release](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/releases) and copy it into the Connectors directory: * Desktop Windows: `C:\Users\[YourUser]\Documents\My Tableau Repository\Connectors` * Desktop MacOS: `/Users/[YourUser]/Documents/My Tableau Repository/Connectors` * Server Windows: `C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Connectors` * Server Linux: `[Your Tableau Server Install Directory]/data/tabsvc/vizqlserver/Connectors` ## Connecting Once the Taco is installed, and you have launched Tableau, you can create a new connection by choosing "DuckDB by MotherDuck": ![Tableau connector list](../img/tableau-connector-list.png) ### Local DuckDB database If you wish to connect to a local DuckDB database, select "Local file" as DuckDB Server option, and use the file picker: ![DuckDB Server dropdown](../img/tableau-connect-options-local-file.png) ![Connection Dialogue](../img/tableau-connect-local-file.png) ### In-Memory Database The driver can be used with an in-memory database by selecting the `In-memory database` DuckDB Server option. ![DuckDB Server dropdown](../img/tableau-connect-options-in-memory.png) The data will then need to be provided by an Initial SQL string e.g., ```sql CREATE VIEW my_parquet AS SELECT * FROM read_parquet('/path/to/file/my_file.parquet'); ``` You can then access it by using the Tableau Data Source editing controls. ### MotherDuck To connect to MotherDuck, you have two authentication options: * Token -- provide the value that you [get from MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). * No Authentication -- unless `motherduck_token` environment variable is available to Tableau at startup, you will then be prompted to authenticate when at connection time. To work with a MotherDuck database in Tableau, you have to provide the database to use when issuing queries. In `MotherDuck Database` field, provide the name of your database. You don't have to prefix it with `md:`: ![DuckDB Server dropdown](../img/tableau-connect-options-md.png) ![Connection Dialogue](../img/tableau-connect-motherduck.png) ## Additional information * [Tableau Documentation](https://help.tableau.com/current/pro/desktop/en-us/gettingstarted_overview.htm) * [Tableau Exchange Connector DuckDB/MotherDuck](https://exchange.tableau.com/en-gb/products/1021) * [DuckDB Tableau Connector](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/) --- --- sidebar_position: 1 title: Amazon S3 --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; ## Configure Amazon S3 credentials You can safely store your Amazon S3 credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. ### Create a SECRET object ```sql -- to configure a secret manually: CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'us-east-1' ); ``` :::note When creating a secret using the `CONFIG` (default) provider, be aware that the credential might be temporary. If so, a `SESSION_TOKEN` field also needs to be set for the secret to work correctly. ::: ```sql -- to store a secret configured through `aws configure`: CREATE SECRET aws_secret IN MOTHERDUCK ( TYPE S3, PROVIDER credential_chain ); ``` ```sql -- test the s3 credentials SELECT count(*) FROM 's3:///'; ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK (TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'us-east-1')"); # testing that our s3 credentials work con.sql("SELECT count(*) FROM 's3:///'").show() # 42 ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) You will then be prompted to enter your Amazon S3 credentials. ![menu_3](./img/settings_secrets_pop_up.png) You can update your secret by executing [CREATE OR REPLACE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command to overwrite your secret. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete your current secrets. ![menu_4](./img/secrets_delete_4.png) ### Amazon S3 credentials as temporary secrets MotherDuck supports DuckDB syntax for providing S3 credentials. ```sql CREATE SECRET ( TYPE S3, KEY_ID 's3_access_key', SECRET 's3_secret_key', REGION 'us-east-1' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: --- --- sidebar_position: 1 title: Azure Blob Storage --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; ## Configure Azure Blob Storage Credentials You can safely store your Azure Blob Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. :::note See [Azure docs](https://learn.microsoft.com/en-gb/azure/storage/common/storage-configure-connection-string#configure-a-connection-string-for-an-azure-storage-account) to find the correct connection string format. ::: ### Create a SECRET object ```sql -- to configure a secret manually: CREATE SECRET IN MOTHERDUCK ( TYPE AZURE, CONNECTION_STRING '[your_connection_string]' ); ``` ```sql -- to store a secret configured through `az configure`: CREATE SECRET az_secret IN MOTHERDUCK ( TYPE AZURE, PROVIDER credential_chain, ACCOUNT_NAME 'some-account' ); ``` ```sql -- test the azure credentials SELECT count(*) FROM 'azure://[container]/[file]' SELECT * FROM 'azure://[container]/*.csv'; ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK (TYPE AZURE, CONNECTION_STRING '[your_connection_string]')"); # testing that our Azure credentials work con.sql("SELECT count(*) FROM 'azure://[container]/[file]'").show() con.sql("SELECT * FROM 'azure://[container]/*.csv'").show() ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) You will then be prompted to enter your Amazon S3 credentials. ![menu_3](./img/secrets_add_azure.png) ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret. ![menu_4](./img/secrets_delete_azure.png) ### Azure credentials as temporary secrets MotherDuck supports DuckDB syntax for providing Azure credentials. ```sql CREATE SECRET ( TYPE AZURE, CONNECTION_STRING '[your_connection_string]' ); ``` or if you use the `az configure` command to store your credentials in the `az` CLI. ```sql CREATE SECRET az_secret ( TYPE AZURE, PROVIDER credential_chain, ACCOUNT_NAME 'some-account' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: --- --- sidebar_position: 1 title: Cloudflare R2 --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; ## Configure Cloudflare R2 credentials You can safely store your Cloudflare R2 credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. :::note See [Cloudflare docs](https://developers.cloudflare.com/r2/api/s3/tokens/) to create a Cloudflare access token. ::: ### Create a SECRET object ```sql CREATE SECRET IN MOTHERDUCK ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' ); ``` :::note The account_id can be found when generating the API token on the endpoint URL `https://.r2.cloudflarestorage.com` ::: ```sql -- test the R2 credentials SELECT count(*) FROM 'r2://[bucket]/[file]' ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' )"); # testing that our Azure credentials work con.sql("SELECT count(*) FROM 'r2://[bucket]/[file]'").show() ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) Select the Secret Type `R2` and fill in the required fields. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret. ![menu_4](./img/secrets_delete_azure.png) ### R2 credentials as temporary secrets MotherDuck supports DuckDB syntax for providing Azure credentials. ```sql CREATE SECRET ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: --- --- sidebar_position: 1 title: Google Cloud Storage --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; With MotherDuck, you can access files in a private Google Cloud Storage (GCS) bucket. ## Google Cloud Storage Requirements - [Enable Amazon S3 API interoperability](https://cloud.google.com/storage/docs/interoperability) - Assign a default project for interoperable access - Create an access key and secret ## Configure Google Cloud Storage credentials You can safely store your Google Cloud Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. ### Create a SECRET object You can safely store your Google Cloud Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. ```sql CREATE SECRET IN MOTHERDUCK ( TYPE GCS, KEY_ID 'access_key', SECRET 'secret_key', ); -- test GCS credentials SELECT count(*) FROM 'gcs:///'; ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK (TYPE GCS, KEY_ID 'access_key', SECRET 'secret_key')"); # test GCS con.sql("SELECT count(*) FROM 'gcs:///'").show() # 42 ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) You will then be prompted to enter your Amazon S3 credentials. ![menu_3](./img/settings_secrets_pop_up.png) You can update your secret by executing [CREATE OR REPLACE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command to overwrite your secret. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete your current secrets. ![menu_4](./img/secrets_delete_4.png) ### Google Cloud Storage credentials as temporary secrets MotherDuck supports DuckDB syntax for providing GCS credentials. ```sql CREATE SECRET ( TYPE GCS, KEY_ID 's3_access_key', SECRET 's3_secret_key' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: --- --- title: Cloud Storage description: Use MotherDuck with your favorite cloud storage services --- import DocCardList from '@theme/DocCardList'; # Cloud Storage MotherDuck integrates with popular cloud storage services to help you manage and store your data. --- --- title: Data Quality Tools description: Monitor and maintain data quality in MotherDuck --- import DocCardList from '@theme/DocCardList'; # Data Quality Tools Ensure data quality and reliability in MotherDuck using these integrated tools. --- --- title: Data Science & AI description: Use MotherDuck with your favorite data science and AI tools --- import DocCardList from '@theme/DocCardList' # Data Science & AI Tools MotherDuck integrates with popular data science and AI tools to help you build powerful machine learning and AI applications. --- --- title: databases description: Use MotherDuck with your favorite databases --- import DocCardList from '@theme/DocCardList'; # Databases MotherDuck integrates directly with popular databases to help you build data pipelines and applications. --- --- sidebar_position: 1 title: PostgreSQL --- [PostgreSQL](https://www.postgresql.org) is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later. As explained by DuckDB Lab's Hannes Muhleisen in the [explainer blog post](https://duckdb.org/2022/09/30/postgres-scanner.html): > PostgreSQL is designed for traditional transactional use cases, "OLTP", where rows in tables are created, updated and removed concurrently, and it excels at this. But this design decision makes PostgreSQL far less suitable for analytical use cases, "OLAP", where large chunks of tables are read to create summaries of the stored data. Yet there are many use cases where both transactional and analytical use cases are important, for example when trying to gain the latest business intelligence insights into transactional data. MotherDuck supports two PostgreSQL-native ways interact with the database: - [The postgres scanner extension for DuckDB](/key-tasks/loading-data-into-motherduck/loading-data-from-postgres) - [pg_duckdb](/concepts/pgduckdb), a PostgreSQL extension for reading data from Postgres. --- --- title: Development Tools description: Developer tools and utilities that work with MotherDuck --- import DocCardList from '@theme/DocCardList'; # Development Tools Use MotherDuck with various development tools and utilities to enhance your workflow. --- --- sidebar_position: 1 title: Apache Iceberg --- MotherDuck supports querying data in the [Apache Iceberg format](https://iceberg.apache.org/). The [Iceberg DuckDB extension](https://duckdb.org/docs/extensions/iceberg.html) is loaded automatically when any of the supported Iceberg functions are called. ## Iceberg functions | Function Name | Description | | :--- | :--- | | `iceberg_scan` | Query Iceberg data | | `iceberg_metadata` | Query Iceberg metadata, such as the snapshot status, data format, and number of records. | | `iceberg_snapshots` | Information about the snapshots available in the data folder. | :::note The available functions are only for reading Iceberg data. Creating or updating data in Iceberg format is not yet supported. ::: ## Examples ```sql -- query data SELECT count(*) FROM iceberg_scan('path-to-iceberg-folder', allow_moved_paths=true); -- query metadata SELECT * FROM iceberg_metadata('path-to-iceberg-folder', allow_moved_paths=true); -- query snapshots SELECT * FROM iceberg_snapshots('path-to-iceberg-folder'); ``` ### Query iceberg data stored in Amazon S3 ```sql SELECT count(*) FROM iceberg_scan('s3:///', allow_moved_paths=true); ``` :::note To query data in a secure Amazon S3 bucket, you will need to configure your [Amazon S3 credentials](../../cloud-storage/amazon-s3). ::: Example using MotherDuck Iceberg sample dataset. ```sql SELECT count(*) FROM iceberg_scan('s3://us-prd-motherduck-open-datasets/iceberg/lineitem_iceberg', allow_moved_paths=true) ``` --- --- sidebar_position: 1 title: Delta Lake --- MotherDuck supports querying data in the [Delta Lake format](https://delta.io/). The [Delta DuckDB extension](https://duckdb.org/docs/extensions/delta.html) is loaded automatically when any of the supported Delta Lake functions are called. ## Delta function | Function Name | Description | Supported parameters | :--- | :--- | :--- | | `delta_scan` | Query Delta Lake data | All the parquet_scan parameters plus delta_file_number. :::note The available functions are only for reading Delta Lake data. Creating or updating data in Delta format is not yet supported. ::: ## Examples ```sql -- query data SELECT COUNT(*) FROM delta_scan('path-to-delta-folder'); -- query data with parameters FROM delta_scan('path-to-delta-folder', delta_file_number=1, file_row_number=1); ``` ### Query Delta data stored in Amazon S3 :::warning At the moment, querying Delta tables stored in Amazon S3 from **public** buckets is not supported. ::: [Create a S3 secret](/sql-reference/motherduck-sql-reference/create-secret.md) in MotherDuck using the secret manager: ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 's3_access_key', SECRET 's3_secret_key', REGION 's3-region' ); ``` Query Delta data stored in S3: ```sql SELECT count(*) FROM delta_scan('s3:///'); ``` :::note To query data in an Amazon S3 bucket, you will need to configure your [Amazon S3 credentials](../../cloud-storage/amazon-s3). ::: Example using MotherDuck Delta sample dataset. ```sql SELECT COUNT(*) FROM delta_scan('s3://us-prd-motherduck-open-datasets/file_format_demo/delta_lake/dat/out/reader_tests/generated/basic_append/delta'); ``` --- --- title: File Formats description: Load data into MotherDuck using various file formats --- import DocCardList from '@theme/DocCardList'; # File Formats Load data into MotherDuck using various file formats. --- --- sidebar_position: 999 title: Creating a New Integration --- Integration with MotherDuck is almost the same as integrating with DuckDB, which means you can do it from any language or framework! There are a few differences: 1) Use `"md:"` or `"md:my_database"` connection string instead of the local filesystem path. 1) Pass `motherduck_token` configuration property (through the config dictionary, connection string parameter or environment variable). 1) Pass `custom_user_agent` to identify the new integration. ### User-agent guidelines * The format is `integration/version(custom-metadata1;custom-metadata2)` where the version and metadata sections are optional. * Avoid using spaces in integration and version sections. * multiple custom metadata sections should be separated by semicolons. Some examples: * `my-integration` * `my-integration/2.9.0` * `my-integration/2.9.0(linux_amd64)` * `my-integration/2.9.0(linux_amd64;us-east-1)` ## Language / Framework examples ### Python ``` con = duckdb.connect("md:my_database", config={ "motherduck_token": token, "custom_user_agent": "INTEGRATION_NAME" }); ``` ### Python with SQLAlchemy ``` eng = create_engine("duckdb:///md:my_database", connect_args={ 'config': { 'motherduck_token': token, 'custom_user_agent': 'INTEGRATION_NAME' } }) ``` ### Java / JDBC ``` Properties config = new Properties(); config.setProperty("motherduck_token", token); config.setProperty("custom_user_agent", "INTEGRATION_NAME"); Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:my_database", config); ``` ### NodeJS ``` var db = new duckdb.Database('md:my_database', { 'motherduck_token': token, 'custom_user_agent': 'INTEGRATION_NAME' }); ``` ### Go ``` db, err := sql.Open("duckdb", "md:my_database?custom_user_agent=INTEGRATION_NAME") ``` ## Implementation best practices If you use DuckDB/MotherDuck in a shared environment where multiple users are served by the same process, the connection string (e.g. URL for JDBC, Database for Python/ODBC) must be unique per user. You can disambiguate the connection string with a unique-per-user substring, for example `md:database_name?user=unique_user_name`. If using the `motherduck_token` in the connection string, make sure not to log it in plaintext. --- [dlt](https://dlthub.com/docs/intro) is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets. It offers a lightweight interface for extracting data from REST APIs, SQL databases, cloud storage, Python data structures, and many more. dlt is designed to be easy to use, flexible, and scalable: * dlt infers schemas and data types, normalizes the data, and handles nested data structures. * dlt supports a variety of popular destinations and has an interface to add custom destinations to create reverse ETL pipelines. * dlt can be deployed anywhere Python runs, be it on Airflow, serverless functions, or any other cloud deployment of your choice. * dlt automates pipeline maintenance with schema evolution and schema and data contracts. Dlt integrates well with DuckDB (they also used it as a local [cache](https://dlthub.com/blog/dltplus-project-cache-in-early-access)) and therefore with MotherDuck. You can check more about MotherDuck integration in the [official documentation](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck). ## Authentication To authenticate with MotherDuck, you have two options: 1. **Environment variable:** export your `motherduck_token` as an environment variable: ```bash export motherduck_token="your_motherduck_token" ``` 2. For Local development: add the token to `.dlt/secrets.toml`: ```toml [destination.motherduck.credentials] password = "my_motherduck_token" ``` ## Minimal example Below is a minimal example of using dlt to load data from a REST API (with fake data) into a DuckDB (MotherDuck) database: ```python import dlt from typing import Dict, Iterator, List, Sequence import random from datetime import datetime from dlt.sources import DltResource @dlt.source(name="dummy_github") def dummy_source(repos: List[str] = None) -> Sequence[DltResource]: """ A minimal DLT source that generates dummy GitHub-like data. Args: repos (List[str]): A list of dummy repository names. Returns: Sequence[DltResource]: A sequence of resources with dummy data. """ if repos is None: repos = ["dummy/repo1", "dummy/repo2"] return ( dummy_repo_info(repos), dummy_languages(repos), ) @dlt.resource(write_disposition="replace") def dummy_repo_info(repos: List[str]) -> Iterator[Dict]: """ Generates dummy repository information. Args: repos (List[str]): List of repository names. Yields: Iterator[Dict]: An iterator over dummy repository data. """ for repo in repos: owner, name = repo.split("/") yield { "id": random.randint(10000, 99999), "name": name, "full_name": repo, "owner": {"login": owner}, "description": f"This is a dummy repository for {repo}", "created_at": datetime.now().isoformat(), "updated_at": datetime.now().isoformat(), "stargazers_count": random.randint(0, 1000), "forks_count": random.randint(0, 500), } @dlt.resource(write_disposition="replace") def dummy_languages(repos: List[str]) -> Iterator[Dict]: """ Generates dummy language data for repositories in an unpivoted format. Args: repos (List[str]): List of repository names. Yields: Iterator[Dict]: An iterator over dummy language data. """ languages = ["Python", "JavaScript", "TypeScript", "C++", "Rust", "Go"] for repo in repos: # Generate 2-4 random languages for each repo num_languages = random.randint(2, 4) selected_languages = random.sample(languages, num_languages) for language in selected_languages: yield { "repo": repo, "language": language, "bytes": random.randint(1000, 100000), "check_time": datetime.now().isoformat(), } def run_minimal_example(): """ Runs a minimal example pipeline that loads dummy GitHub data to MotherDuck. """ # Define some dummy repositories repos = ["example/repo1", "example/repo2", "example/repo3"] # Configure the pipeline pipeline = dlt.pipeline( pipeline_name="minimal_github_pipeline", destination='motherduck', dataset_name="minimal_example", ) # Create the data source data = dummy_source(repos) # Run the pipeline with all resources info = pipeline.run(data) print(info) # Show what was loaded print("\nLoaded data:") print(f"- {len(repos)} repositories") print(f"- Languages for {len(repos)} repositories") if __name__ == "__main__": run_minimal_example() ``` dlt revolves around three core concepts: * Sources: Define where the data comes from. * Resources: Represent structured units of data within a source. * Pipelines: Manage the data loading process. In the example above: * dummy_source defines a source that simulates GitHub-like data. * dummy_repo_info and dummy_languages are resources producing repository and language data. * A pipeline loads this data into MotherDuck. The core integration with MotherDuck is defined in the pipeline configuration: ```python pipeline = dlt.pipeline( pipeline_name="minimal_github_pipeline", destination="motherduck", dataset_name="minimal_example", ) ``` Setting destination="motherduck" tells dlt to load the data into MotherDuck. --- --- title: Ingestion description: Configure MotherDuck as the destination for your data in the following data ingestion tools --- import DocCardList from '@theme/DocCardList'; # Ingestion Tools Configure MotherDuck as the destination for your data in the following data ingestion tools. --- --- title: Integrations description: Integrations that work with MotherDuck from the modern data stack sidebar_class_name: integration-icon --- import { IntegrationsTable } from "./integrations.table.js"; import "./integrations.css"; MotherDuck integrates with a lot of common tools from the modern data stack. If you would like to create a new integration, see [this guide](how-to-integrate). Below, you will find a comprehensive list of integrations that work with MotherDuck. Each integration includes links to either our own detailed tutorials, the integrator's documentation, or insightful articles and blogs that can help you get started. :::info When working with integrations, it may be useful to be aware of the [different connection string parameters](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#using-connection-string-parameters) you can use to connect to MotherDuck. ::: ## Supported Integrations Use the search box to find specific integrations or click on category tags to filter the table. :::note See [DuckDB documentation](https://duckdb.org/docs/api/overview.html) for the full list of supported client APIs and drivers. ::: ## Diagram: Modern Duck Stack ![img_duck_stack](../img/md-diagram.svg) --- --- sidebar_position: 1 title: Go driver --- As of **v1.2.3**, the [go-duckdb driver](https://github.com/marcboeker/go-duckdb) supports MotherDuck out of the box! To connect, you need a dependency on the driver in your `go.mod` file: ```go require github.com/marcboeker/go-duckdb v1.2.3 ``` Your code can then open a connection using the standard [database/sql](https://pkg.go.dev/database/sql) package, or any other mechanisms supported by [go-duckdb](https://github.com/marcboeker/go-duckdb/blob/master/README.md): ```go db, err := sql.Open("duckdb", "md:my_db") ``` --- --- title: Language APIs & Drivers description: Connect to MotherDuck using your preferred programming language --- import DocCardList from '@theme/DocCardList'; # Language APIs & Drivers Connect to MotherDuck using official drivers and APIs for various programming languages. --- --- sidebar_position: 1 title: JDBC driver --- The official [DuckDB JDBC driver](https://duckdb.org/docs/api/java.html) supports MotherDuck out of the box! To connect, you need a dependency on the driver. For example, in your Maven pom.xml file: ```xml org.duckdb duckdb_jdbc 0.9.1 ``` Your code can then create a `Connection` by using `jdbc:duckdb:md:databaseName` connection string format: ```xml Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db"); ``` This `Connection` can then be [used directly](https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/java/sql/Connection.html) or through any framework built on `java.sql` JDBC abstractions. There are currently 3 ways to authenticate with a valid MotherDuck token: 1) Environment variable `motherduck_token` 2) Passing the token as a connection string parameter: ```xml Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?motherduck_token="+token); ``` 3) Interactive authentication through a web browser. See [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md) for more details. --- --- title: Python description: Connect to MotherDuck using Python --- Check out our [Python tutorial](/getting-started/connect-query-from-python/installation-authentication). --- --- sidebar_position: 3 title: SQLAlchemy with DuckDB and MotherDuck sidebar_label: SQLAlchemy --- [SQLAlchemy](https://www.sqlalchemy.org/) is a SQL toolkit and Object-Relational Mapping (ORM) system for Python, providing full support for SQL expression language constructs and various database dialects. A lot of Business Intelligence tools supports SQLAlchemy out of the box. Using the [DuckDB SQLAlchemy driver](https://github.com/Mause/duckdb_engine) we can connect to MotherDuck using an SQLAlchemy URI. ## Install the DuckDB SQLAlchemy driver ```bash pip install --upgrade duckdb-engine ``` ## Configuring the database connection to a local DuckDB database A local DuckDB database can be accessed using the SQLAlchemy URI: ```bash duckdb:///path/to/file.db ``` ## Configuring the database connection to MotherDuck The general pattern for the SQLAlchemy URI to access a MotherDuck database is: ```bash duckdb:///md:?motherduck_token= ``` :::info The database name `` in the connection string is **optional**. This makes it possible to query multiple databases with one connection to MotherDuck. ::: Connecting and authentication can be done in several ways: 1. If no token is available, the process will direct you to a web login for authentication, which will allow you to obtain a token. ```python from sqlalchemy import create_engine, text eng = create_engine("duckdb:///md:my_db") with eng.connect() as conn: result = conn.sql(text("show databases")) for row in result: print(row) ``` When running the above, you will see something like this to authenticate: ![motherduck login](../img/sqlalchemy_auth.png) 2. The `MOTHERDUCK_TOKEN` is already set as environment variable ```python from sqlalchemy import create_engine, text eng = create_engine("duckdb:///md:my_db") with eng.connect() as conn: result = conn.sql(text("show databases")) for row in result: print(row) ``` 3. Using configuration dictionary ```python from sqlalchemy import create_engine, text config = {} token = 'asdfwerasdf' # Fill in your token config["motherduck_token"] = token; eng = create_engine( "duckdb:///md:my_db", connect_args={ 'config': config} ) with eng.connect() as conn: result = conn.sql(text("show databases")) for row in result: print(row) ``` 4. Passing the token as a connection string parameter ```python from sqlalchemy import create_engine, text token = 'asdfwerasdf' # Fill in your token eng = create_engine(f"duckdb:///md:my_db?motherduck_token={token}") with eng.connect() as conn: result = conn.sql(text("show databases")) for row in result: print(row) ``` --- --- sidebar_position: 1 title: R --- [R](https://www.r-project.org/) is a language for statistical analysis. To connect to MotherDuck from an R program, you need to first install DuckDB: ``` install.packages("duckdb") ``` You'll then need to load the `motherduck` extension and `ATTACH 'md:'` to connect to all of your databases. To connect to only one database, use `ATTACH 'md:my_db'` syntax. ``` library("DBI") con <- dbConnect(duckdb::duckdb()) dbExecute(con, "LOAD 'motherduck'") dbExecute(con, "ATTACH 'md:'") dbExecute(con, "USE my_db") res <- dbGetQuery(con, "SHOW DATABASES") print(res) ``` Once connected, any R syntax described in the [DuckDB's documentation](https://duckdb.org/docs/api/r.html) should work. :::note Extension autoloading is turned off in R duckdb distributions, so `dbdir = "md:"` style connections do not connect to MotherDuck. ::: ## Considerations and limitations ### Windows integration MotherDuck extension is not currently available on Windows. As a workaround, you can use [WSL](https://learn.microsoft.com/en-us/windows/wsl/about) (Windows Subsystem for Linux) --- --- title: Orchestration description: Orchestrate data pipelines with MotherDuck --- import DocCardList from '@theme/DocCardList'; # Orchestration Tools Build and manage data pipelines with MotherDuck using these orchestration tools. --- --- sidebar_position: 5 title: DataGrip --- JetBrains [DataGrip](https://www.jetbrains.com/datagrip/) is a cross-platform IDE for working with SQL and noSQL databases. It includes a DuckDB integration, which makes connecting to MotherDuck easy. ## Connecting to MotherDuck in DataGrip Start by creating a new data source, selecting DuckDB as the database engine. This opens up the **Data Sources and Drivers** window. ### Token Authentication To retrieve a MotherDuck token, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md). In **Data Sources and Drivers > General**, select **No auth** option for **Authentication**. Then, fill out the **URL** field following this format, replacing `my_db` with your target MotherDuck database or leaving it out if no specific database is targeted: ```sh jdbc:duckdb:md:[my_db] ``` ![config](../img/datagrip_config.png) In the **Data Sources and Drivers > Advanced** tab, add a variable `motherduck_token` set to the token retrieved from the prior step. ![config](../img/datagrip_token.png) Click "OK" to begin querying MotherDuck! :::note The default schema filtering configuration of DataGrip may hide some of the schemas that exist in your MotherDuck account. Reconfigure to display all schemas following [DataGrip documentation](https://www.jetbrains.com/help/datagrip/schemas.html). ::: --- --- sidebar_position: 5 title: DBeaver --- [DBeaver Community](https://dbeaver.io/) is a free cross-platform database integrated development environment (IDE). It includes a DuckDB integration, so it is a great choice for querying MotherDuck. ## DBeaver DuckDB Setup DBeaver uses the official [DuckDB JDBC driver](https://duckdb.org/docs/api/java.html), which supports MotherDuck out of the box! To install DBeaver and the DuckDB driver, first follow the [DuckDB DBeaver guide](https://duckdb.org/docs/guides/sql_editors/dbeaver). That guide will create a local DuckDB in memory connection. After completing those steps, follow the steps below to add a MotherDuck connection in addition! ## Connecting DBeaver to MotherDuck ### Browser Authentication Create a new DuckDB connection in DBeaver. When entering the connection string in DBeaver, instead of using `:memory:` for an in memory DuckDB, use `md:my_db`. Replace `my_db` with the name of the target MotherDuck database as needed. Clicking either "Test Connection" or "Finish" will open the default browser and display an authorization prompt. Click "Confirm", then return to DBeaver to begin querying MotherDuck! ### Token Authentication To avoid the authentication prompt when opening DBeaver, a MotherDuck access token can be included as a connection string parameter. To retrieve a token, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md). Then, create a new DuckDB connection in DBeaver. Include the token as a query string parameter in the connection string following this format, replacing `` with the access token from the prior step, and `my_db` with the target MotherDuck database: ```sh md:my_db?motherduck_token= ``` Click "Finish" to begin querying MotherDuck! --- --- title: SQL IDEs description: Use MotherDuck with your favorite SQL development environments --- import DocCardList from '@theme/DocCardList'; # SQL IDEs Connect to MotherDuck using popular SQL development environments and query editors. --- --- sidebar_position: 1 title: dbt with DuckDB and MotherDuck description: DuckDB and MotherDuck both support using dbt to manage data loading and transformation sidebar_label: dbt --- [Data Build Tool](https://www.getdbt.com/) (dbt) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses by defining SQL in model files. It bring the composability of programming languages to SQL while automating the mechanics of updating tables. [dbt-duckdb](https://github.com/jwills/dbt-duckdb) is the adapter which allows dbt to use DuckDB and MotherDuck. The adapter also supports [DuckDB extensions](https://duckdb.org/docs/extensions/overview) and any of the additional [DuckDB configuration options](https://duckdb.org/docs/sql/configuration). ## Installation Since dbt is a Python library, it can be installed through pip: ```pip3 install dbt-duckdb``` will install both `dbt` and `duckdb`. ## Configuration for Local DuckDB This configuration allows you to connect to S3 and perform read/write operations on Parquet files using an AWS access key and secret. `profiles.yml` ```yaml default: outputs: dev: type: duckdb path: /tmp/dbt.duckdb threads: 4 extensions: - httpfs - parquet settings: s3_region: my-aws-region s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}" target: dev ``` :::tip The `path` attribute specifies where your DuckDB database file will be created. By default, this path is relative to your `profiles.yml` file location. If the database doesn't exist at the specified path, DuckDB will automatically create it. ::: You can find more information about these connections profiles in the [dbt documentation](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles). ## Configuration for MotherDuck The only change needed for motherduck is the `path:` setting. ```yaml default: outputs: dev: type: duckdb path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}" threads: 4 extensions: - httpfs - parquet settings: s3_region: my-aws-region s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}" target: dev ``` This assumes that you have setup `MOTHERDUCK_TOKEN` as an environment variable. To know more about how to persist your authentication credentials, read [Authenticating to MotherDuck using an access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). If you don't set the `motherduck_token` in your path, you will be prompted to authenticate to MotherDuck when running your `dbt run` command. ![auth_md](../img/auth_dbt.png) Follow the instructions and it will export the service account variable for the current `dbt run` process. DuckDB will parallelize a single write query as much as possible, so the gains from running more than one query at a time are minimal on the database side. That being said, our testing indicates that setting `threads: 4` typically leads to the best performance. ## Extra resources Take a look at our video guide on DuckDB and dbt provided below, along with the corresponding [demo tutorial on GitHub](https://github.com/mehd-io/dbt-duckdb-tutorial). --- --- title: Data Transformation description: Transform your data inside MotherDuck --- import DocCardList from '@theme/DocCardList'; # Data Transformation Use MotherDuck to transform your data. --- --- title: Web Development description: Build web applications with MotherDuck --- import DocCardList from '@theme/DocCardList'; # Web Development Use MotherDuck to power your web applications and services. --- --- sidebar_position: 1 title: Vercel description: Hosting a web application with MotherDuck Wasm SDK on Vercel sidebar_label: Vercel --- [Vercel](https://vercel.com/) is a cloud platform for static sites and serverless functions. It is a great platform for hosting web applications using [MotherDuck Wasm SDK](/docs/key-tasks/data-apps/wasm-client/). Vercel typically provides two ways to integrate with 3rd party services : - Native integration : create a new account on the 3rd party service and connect it to Vercel. Billing and setup is managed by Vercel. - Non-native integration (connectable accounts) : connect existing 3rd party accounts to Vercel. :::info Vercel supports Native Integration with MotherDuck, support for non-native integration is coming soon. ::: ## Native Integration To kickstart the integration, you can either start from : - [Vercel's marketplace](https://vercel.com/marketplace/motherduck) and install the integration from there on an existing Vercel project. - Deploy a new project from [MotherDuck's Vercel template](https://vercel.com/templates/motherduck/placeholder) which includes snippets to get started with MotherDuck and your Next.js project. ### How to install 1. To install the MotherDuck Native Integration from the Vercel Marketplace: 2 Navigate to the Vercel Marketplace or to the Integrations Console on your Vercel Dashboard. 3. Locate the MotherDuck integration. 4. Click Install. 5. On the Install MotherDuck modal, you are presented with two plans options. ![modal1](./img/vercel1.png) 6. On the next modal, you would be prompt to give your database a name. Note that a new installation will create a new account and database within a new MotherDuck organization. ![modal2](./img/vercel2.png) 7. You are all set! You have now a new account and database within a new organization. Plus, tokens ([access token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token), and [read scaling token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#understanding-read-scaling-tokens)) are automatically generated and stored in Vercel's environment variables. ![model3](./img/vercel3.png) You can head to `Getting Started` section on the integration page to have more information on how to use the integration. ![model4](./img/vercel4.png) ### Project templates Learn more about how to setup your projects by using the following templates: - [MotherDuck's Vercel template](https://github.com/MotherDuck-Open-Source/nextjs-motherduck-wasm-analytics-quickstart) : A fully-fledged template that includes a Next.js project and a MotherDuck WASM setup with sample data integration and an interactive data visualization example. - [MotherDuck's Vercel template minimal](https://github.com/MotherDuck-Open-Source/nextjs-motherduck-wasm-analytics-quickstart-minimal) : a minimal template which includes a Next.js project and MotherDuck Wasm setup with some sample data integration. --- --- title: Authenticating and connecting to MotherDuck description: Learn how to authenticate and connect to MotherDuck --- # Authenticating and connecting to MotherDuck These pages explain how to connect to MotherDuck using the CLI, Python, JDBC and NodeJS. First, you need to [authenticate to MotherDuck](./authenticating-to-motherduck) by [manual authentication](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#manual-authentication) via the Web UI, or automatic authentication via an [access token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token). To connect to a MotherDuck database, you can [create a connection](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/). import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 1 title: Authenticating to MotherDuck description: Authenticate to a MotherDuck account --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Authenticating to MotherDuck MotherDuck supports two types of authentication: - Manual authentication, typically used by the MotherDuck UI - Authentication using an access token, more convenient for Python, CLI or other clients. ## Manual authentication MotherDuck UI authenticates using several methods: - Google - Github - Username and password You can leverage multiple modes of authentication in your account. For example, you can authenticate both via Google and via username and password as you see fit. To authenticate in CLI or Python, you will be redirected to an authentication web page. Currently, this happens every session. To avoid having to re-authenticate, you can save your access token, as described in the [Authenticate With an Access Token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token) section. ## Authentication using an access token If you are using Python or CLI and don't want to authenticate every session, you can securely save your credentials locally. ### Creating an access token To create an access token: - Go to the [MotherDuck UI](https://app.motherduck.com) - In top left click on organization name and then `Settings` - Click `+ Create token` - Specify a name for the token that you'll recognize (like "DuckDB CLI on my laptop") - Specify the type of token you want. Tokens can be Read/Write (default) or [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). - Choose whether you want the token to expire and then click on `Create token` - Copy the access token token to your clipboard by clicking on the copy icon ![access token example](./img/creating_access_token.jpg) ### Storing the access token as an environment variable You can save the access token as `motherduck_token` in your environment variables. An example of setting this in a terminal: ```bash export motherduck_token='' ``` You can also add this line to your `~/.zprofile` or `~/.bash_profile`, or store it in a `.env` file in your project root. Once this is done, your authentication token is saved and you can connect to MotherDuck with the following connection string: ```bash duckdb "md:my_db" ``` :::info This is the best practice for security reasons. The token is sensitive information and should be kept safe. Do not share it with others. ::: Alternatively, you can specify an access token in the MotherDuck connection string: `md:my_db?motherduck_token=`. ```bash duckdb "md:my_db?motherduck_token=" ``` When in the DuckDB CLI, you can use the `.open` command and specify the connection string as an argument. ```CLI .open md:my_db?motherduck_token= ``` ## Using connection string parameters ### Authentication using SaaS mode You can limit MotherDuck's ability to interact with your local environment using `SaaS Mode`: - Disable reading or writing local files - Disable reading or writing local DuckDB databases - Disable installing or loading any DuckDB extensions locally - Disable changing any DuckDB configurations locally This mode is useful for third-party tools, such as BI vendors, that host DuckDB themselves and require additional security controls to protect their environments. :::info Using this parameter requires to use `.open` when using the DuckDB CLI or `duckdb.connect` when using Python. This initiates a new connection to MotherDuck and will detach any existing connection to a local DuckDB database. ::: ```cli .open md:[]?[motherduck_token=]&saas_mode=true ``` ```python conn = duckdb.connect("md:[]?[motherduck_token=]&saas_mode=true") ``` ### Using attach mode By default, when you connect to MotherDuck, you will be connected to all databases you have access to. If you want limited the connection to only one database, you can use the `attach_mode` with the value `single`. For example, to connect to a database named `my_database`, run: ```bash duckdb 'md:my_database?attach_mode=single' ``` :::note `` that starts with a number cannot be connected to directly. You will need to connect without a database specified and then `CREATE` and `USE` using a double quoted name. Eg: `USE DATABASE "1database"` ::: --- --- sidebar_position: 2 title: Connecting to MotherDuck description: Create one or more connections to a MotherDuck database --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; A single DuckDB connection executes one query at a time, aiming to maximize the performance of that query, making reuse of a single connection both simple and performant. We recommend starting with the simplest way of connecting to MotherDuck and running queries, and if that does not meet your requirements, to explore the advanced use-cases described in subsequent sections. ## Create a connection The below code snippets show how to create a connection to a MotherDuck database from the CLI, Python, JDBC and NodeJS language APIs. To connect to your MotherDuck database, use `duckdb.connect("md:my_database_name")` which will return a `DuckDBPyConnection` object that you can use to interact with your database. ```python import duckdb # Create connection to your default database conn = duckdb.connect("md:my_db") # Run query conn.sql("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)") conn.sql("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)") res = conn.sql("SELECT * FROM items") # Close the connection conn.close() ``` To connect to your MotherDuck database, you can create a `Connection` by using the `"jdbc:duckdb:md:databaseName"` connection string format. For authentication, you need to provide a MotherDuck token. There are two ways to provide the token: 1. As a connection property: ```java import java.sql.Connection; import java.sql.DriverManager; import java.sql.Statement; import java.sql.ResultSet; import java.util.Properties; // Create properties with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); // Create connection to your database try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); Statement stmt = conn.createStatement()) { stmt.executeUpdate("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)"); stmt.executeUpdate("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) { while (rs.next()) { System.out.println("Item: " + rs.getString(1) + " costs " + rs.getInt(3)); } } } ``` 2. As part of the connection string: ```java // Create connection with token in the connection string try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?motherduck_token="); Statement stmt = conn.createStatement()) { stmt.executeUpdate("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)"); stmt.executeUpdate("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) { while (rs.next()) { System.out.println("Item: " + rs.getString(1) + " costs " + rs.getInt(3)); } } } ``` :::info For security reasons, it's generally recommended to use environment variables to store your MotherDuck token rather than hardcoding it in your application. If an environment variable named `motherduck_token` is set, it will be used automatically. ::: To connect to your MotherDuck database, you can create a `duckdb.Database` with the `'md:databaseName'` connection string format: ```javascript const duckdb = require('duckdb'); // Create connection to your default database const db = new duckdb.Database('md:my_db'); // Run queries db.all('CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)', function(err, res) { if (err) { console.warn(err); return; } console.log(res[0].fortytwo) }); ``` To connect to your MotherDuck database, run `duckdb md:`. ```shell duckdb "md:my_db" ``` Now, you will enter the DuckDB interactive terminal to interact with your database. ```sql D CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER); D INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2); D SELECT * FROM items; ``` ## Multiple Connections and the Database Instance cache DuckDB clients in Python, R, JDBC, and ODBC prevent redundant reinitialization by keeping instances of database-global context cached by the database path. Other language APIs are likely to get similar functionality over time. When connecting to MotherDuck, the instance is cached for an additional 15 minutes after the last connection is closed (see [Setting Custom Database Instance Cache TTL](#setting-custom-database-instance-cache-time-ttl) for how to override this value). For an application that creates and closes connections frequently, this could provide a significant speedup for connection creation, as the same catalog data can be reused across connections. This means that only the first of multiple connections to the same database will take the time to load the MotherDuck extension, verify its signature, and fetch the catalog metadata. ```python con1 = duckdb.connect("md:my_db") // MotherDuck catalog fetched con2 = duckdb.connect("md:my_db") // MotherDuck catalog reused ``` ```java // Create properties with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); try (var con1 = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); // MotherDuck catalog fetched var con2 = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); // MotherDuck catalog reused ) { // ... } ``` For language APIs that do not yet have a database instance cache, reusing the same database instance will prevent redundant reinitialization: ```typescript const db = await DuckDBInstance.create('md:my_db', (err) => { /* ... */ }); const con1 = await db.connect(); const con2 = await db.connect(); ``` ## Setting Custom Database Instance Cache Time (TTL) By default, connections to MotherDuck established through the database instance caching supporting DuckDB APIs will reuse the same database instance for 15 minutes after the last connection is closed. In some cases, you may want to make that period longer (to avoid the redundant reinitialization) or shorter (to connect to the same database with a different configuration). The database TTL value can be set either at the initial connection time, or by using the `SET` command at any point. Any valid [DuckDB Instant part specifiers](https://duckdb.org/docs/stable/sql/functions/datepart.html#part-specifiers-usable-as-date-part-specifiers-and-in-intervals) can be used for the TTL value, for example '5s', '3m', or '1h'. :::note The examples below assume you have configured your MotherDuck token using one of the authentication methods described in the [Create a connection](#create-a-connection) section above. ::: ```python con = duckdb.connect("md:my_db?dbinstance_inactivity_ttl=1h") con.close() # different database connection string (without `?dbinstance_inactivity_ttl=1h`), no instance cached; TTL is 15 minutes (default) con2 = duckdb.connect("md:my_db") # allow the database instance to expire immediately con2.execute("SET motherduck_dbinstance_inactivity_ttl='0s'") # the database instance can only expire after the last connection is closed con2.close() # new database instance with a new TTL (the 15 minute default) con3 = duckdb.connect("md:my_db") con3.close() # the last TTL for this database was 15 minutes; the cached database instance will be reused con4 = duckdb.connect("md:my_db") ``` The TTL can be set either through the connection string or through Properties. However, be careful when using Properties as the database instance cache is keyed by the connection string. This means that if you change the TTL in Properties between connections, you'll get an error as it's trying to connect to the same database with different configurations. Here's an example that will fail: ```java Properties props = new Properties(); props.setProperty("motherduck_dbinstance_inactivity_ttl", "2m"); // First connection works fine try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) { // TTL is set to 2m } // Changing TTL in properties will fail props.setProperty("motherduck_dbinstance_inactivity_ttl", "5m"); try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) { // This will throw: "Can't open a connection to same database file // with a different configuration than existing connections" } ``` For this reason, it's generally safer to set the TTL through the connection string: ```java // Set TTL through connection string try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db?dbinstance_inactivity_ttl=1h")) { // TTL is set to 1h } // Different TTL creates a new instance try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db?dbinstance_inactivity_ttl=30m")) { // This works - creates a new instance with 30m TTL } // Can also set TTL using SQL try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db"); var st = con.createStatement()) { // allow the database instance to expire immediately st.executeUpdate("SET motherduck_dbinstance_inactivity_ttl='0s'"); } ``` :::note When using Properties, you must include the `motherduck_` prefix for the TTL property name (i.e., `motherduck_dbinstance_inactivity_ttl`). This prefix is only optional when passing the TTL through the connection string. ::: ## Connect to multiple databases If you need to connect to MotherDuck and run one or more queries in succession on the same account, you can use a [single database connection](#create-a-connection). If you want to connect to another database in the same account, you can either [reuse the same connection](#example-1-reuse-the-same-duckdb-connection), or [create copies](#example-2-create-copies-of-the-initial-duckdb-connection) of the connection. If you need to connect to multiple databases, you can either directly reuse the same `DuckDBPyConnection` instance, or create copies of the connection using the `.cursor()` method. :::note `FROM ` is a shorthand version of `SELECT * FROM
`. ::: ### Example 1: Reuse the same DuckDB Connection To connect to different databases in the same MotherDuck account, you can use the same connection object and simply fully qualify the names of the tables in your query. ```python conn = duckdb.connect("md:my_db") res1 = conn.sql("FROM my_db1.main.tbl") res2 = conn.sql("FROM my_db2.main.tbl") res3 = conn.sql("FROM my_db3.main.tbl") conn.close() ``` ### Example 2: Create copies of the initial DuckDB Connection `conn.cursor()` returns a copy of the DuckDB connection, with a reference to the existing DuckDB database instance. Closing the original connection also closes all associated cursors. ```python conn = duckdb.connect("md:my_db") cur1 = conn.cursor() cur2 = conn.cursor() cur3 = conn.cursor() cur1.sql("USE my_db1") cur2.sql("USE my_db2") cur3.sql("USE my_db3") res = [] for cur in [cur1, cur2, cur3]: res.append(cur.sql("SELECT * FROM tbl")) # This closes the original DuckDB connection and all cursors conn.close() ``` :::note `duckdb.connect(path)` creates and caches a DuckDB instance. Subsequent calls with the same path reuse this instance. New connections to the same instance are independent, similar to `conn.cursor()`, but closing one doesn't affect others. To create a new instance instead of using the cached one, make the path unique (e.g., `md:my_db?user=`). ::: ### Example 3: Create multiple connections You can also create multiple connections to the same MotherDuck account using different DuckDB instances. However, keep in mind that each connection takes time to establish, and if connection times are an important factor for your application, it might be beneficial to consider [Example 1](#example-1-reuse-the-same-duckdb-connection) or [Example 2](#example-2-create-copies-of-the-initial-duckdb-connection). :::note If you need to run queries on separate connections in quick succession, instead of opening and closing a connection for every query, we recommend using a Connection Pool ([Python](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python#connection-pooling), [JDBC](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-jdbc#connection-pooling) or [NodeJS](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-nodejs#connection-pooling)). ::: ```python conn1 = duckdb.connect("md:my_db1") conn2 = duckdb.connect("md:my_db2") conn3 = duckdb.connect("md:my_db3") res1 = conn1.sql("SELECT * FROM tbl") res2 = conn2.sql("SELECT * FROM tbl") res3 = conn3.sql("SELECT * FROM tbl") conn1.close() conn2.close() conn3.close() ``` If you need to connect to multiple databases, you typically won't need to create multiple DuckDB instances. You can either directly reuse the same `DuckDBConnection` instance, or create copies of the connection using the `.duplicate()` method. ```java // Create connection with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); try (DuckDBConnection duckdbConn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) { Connection conn1 = duckdbConn.duplicate(); Connection conn2 = duckdbConn.duplicate(); Connection conn3 = duckdbConn.duplicate(); // ... } ``` If you need to connect to multiple databases, you can use the same `duckdb.Database` instance and create copies of the connection using the `.connect()` method. ```javascript const db = new duckdb.Database('md:my_db'); const con = db.connect(); con.all('FROM my_db1.main.tbl', function(err, res) { if (err) { console.warn(err); return; } console.log(res[0].fortytwo) }); ``` --- --- sidebar_position: 2 title: Multithreading and Parallelism with JDBC and MotherDuck sidebar_label: JDBC description: Performance tuning via multithreading with multiple connections to MotherDuck with JDBC --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Multithreading and parallelism with JDBC Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create [connections in multiple threads](#connections-in-multiple-threads). For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. ## Connections in multiple threads If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency. :::note Concurrency is supported by DuckDB, across multiple threads, as described in the [Concurrency](https://duckdb.org/docs/connect/concurrency.html) documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on [Parallelism](https://duckdb.org/docs/guides/performance/how_to_tune_workloads.html#parallelism-multi-core-processing) in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial. ::: First, let's create a class `MultithreadingExample` and get the MotherDuck token from your environment variables. ```java package com.example; import org.duckdb.DuckDBConnection; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.List; import java.util.Properties; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; /** * Examples for multithreading and connection pooling */ public class MultithreadingExample { private static final String token = System.getenv("motherduck_token"); private final static Logger logger = LoggerFactory.getLogger(MultithreadingExample.class); ``` To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the `.duplicate()` method to run a query: ```java private static void runQueryFromThread(String label, Connection conn, String query) { try (Connection dupConn = ((DuckDBConnection) conn).duplicate(); Statement st = dupConn.createStatement(); ResultSet rs = st.executeQuery(query)) { if (rs.next()) { logger.info("{}: found at least one row", label); } else { logger.info("{}: no rows found", label); } } catch (SQLException e) { throw new RuntimeException("can't run query", e); } } ``` You can then use a thread pool executor to run the queries using the `runQueryFromThread` method: ```java public static void main(String[] args) throws SQLException, InterruptedException { // Check that a motherduck_token exists if (token == null) { throw new IllegalArgumentException( "Please provide `motherduck_token` environment variable"); } // Add MotherDuck token to config Properties config = new Properties(); config.setProperty("motherduck_token", token); // Create list of queries to run in multiple threads List queries = new ArrayList<>(); queries.add("SELECT 42;"); queries.add("SELECT 'Hello World!';"); int num_queries = queries.size(); // Create thread pool executor and run queries ExecutorService executor = Executors.newFixedThreadPool(num_queries); try (Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:my_db", config);) { for (int i = 0; i < num_queries; i++) { String label = "query " + i; String query = queries.get(i); executor.submit(() -> runQueryFromThread(label, mdConn, query)); } executor.shutdown(); boolean success = executor.awaitTermination(30, TimeUnit.SECONDS); } if (success) { logger.info("successfully ran {} queries in threads", num_queries); } } } ``` ## Connection pooling If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. For connection pools, we recommend using [HikariCP](https://github.com/brettwooldridge/HikariCP). Below is an example implementation. For this implementation, you can connect to a user account by providing a `motherduck_token` in your database path. The goal of this implementation is to distribute operations across multiple databases in a round-robin fashion. This `HikariMultiPoolManager` class manages multiple `HikariDataSource`s (connection pools) which each connect to a different connection url, and rotates between them when `getConnection()` is called. You can specify a pool size which is applied to all `HikariDataSource`s. ```java package com.example; import com.zaxxer.hikari.HikariDataSource; import com.zaxxer.hikari.HikariPoolMXBean; import org.duckdb.DuckDBConnection; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; /** * Example DuckDB connection pool implementation */ public class HikariMultiPoolManager implements AutoCloseable { private static final String token = System.getenv("motherduck_token"); private final List dataSources; private final AtomicInteger index; private final static Logger logger = LoggerFactory.getLogger(HikariMultiPoolManager.class); public HikariMultiPoolManager(List urls, int maximumPoolSize) { // Create Hikari datasources from urls this.dataSources = new ArrayList<>(); for (String url : urls) { HikariDataSource ds = new HikariDataSource(); ds.setMaximumPoolSize(maximumPoolSize); ds.setJdbcUrl(url); dataSources.add(ds); } this.index = new AtomicInteger(0); } public Connection getConnection() throws SQLException { int ind = index.getAndIncrement() % dataSources.size(); HikariDataSource ds = dataSources.get(ind); return ds.getConnection(); } public void evict() throws Exception { for (HikariDataSource ds : dataSources) { HikariPoolMXBean poolBean = ds.getHikariPoolMXBean(); if (poolBean != null) { poolBean.softEvictConnections(); } } } @Override public void close() throws Exception { for (HikariDataSource ds : dataSources) { ds.close(); } } ``` ### How to set `urls` The `HikariMultiPoolManager` takes a list of `urls` and an optional input argument `maximumPoolSize` (defaults to 1). Each path in the list will get a `HikariDataSource` in the pool, that readers can use to query the database(s) they connect to. If you have a `maximumPoolSize` that is larger than 1, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool. :::note To learn more about database instances and connections, see [Connect to multiple databases](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#connect-to-multiple-databases). ::: To create a connection pool with 3 connections to **the same database**, you can pass a single database path, and set `maximumPoolSize=3`: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 3); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `maximumPoolSize` is larger than the number of databases. You can also create multiple connections to **the same database** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding `&user=` to the database path: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=1"); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=2"); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=3"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **the same MotherDuck account** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token + "&access_mode=read_only"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **separate MotherDuck accounts** using *different DuckDB instances*. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token1 + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token2 + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token3 + "&access_mode=read_only"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. ### How to run queries with a thread pool You can then fetch connections from the pool, for example, to run queries from a queue. You can use a `ThreadPoolExecutor` with 3 workers to fetch connections from the pool and run the queries using a `run_query` function: ```java private static String queryString(HikariMultiPoolManager pool, String query) throws SQLException { try (Connection conn = pool.getConnection(); Statement ps = conn.createStatement(); ResultSet rs = ps.executeQuery(query)) { logger.info("connection = {}", conn); String res = rs.next() ? rs.getString(1) : "[not found]"; logger.info("Got: {}", res); return res; } } public static void main(String[] args) throws Exception { if (token == null) { throw new IllegalArgumentException( "Please provide `motherduck_token` environment variable"); } List queries = new ArrayList<>(); // Add queries here // Example: queries.add("SELECT 42;"); queries.add("SELECT 'Hello World!';"); List urls = new ArrayList<>(); // Add urls here // Example: urls.add("jdbc:duckdb:md:my_db?user=1&motherduck_token=" + token); urls.add("jdbc:duckdb:md:my_db?user=2&motherduck_token=" + token); urls.add("jdbc:duckdb:md:my_db?user=3&motherduck_token=" + token); // Create thread pool and run queries try(HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);) { ExecutorService executor = Executors.newFixedThreadPool(urls.size()); for (String query : queries) { executor.submit(() -> queryString(pool, query)); } executor.shutdown(); boolean success = executor.awaitTermination(30, TimeUnit.SECONDS); if (success) { logger.info("successfully ran {} queries in threads with connection pool", queries.size()); } } } } ``` Reset the connection pool at least once every 24 hours, by soft evicting all connections. This ensures that you are always running on the latest version of MotherDuck. ```java pool.evict() ``` --- --- sidebar_position: 3 title: Multithreading and Parallelism with NodeJS and MotherDuck sidebar_label: NodeJS description: Performance tuning via multithreading with multiple connections to MotherDuck with NodeJS --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Multithreading and parallelism with NodeJS For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). Depending on the needs of your data application, you can use thread-based parallelism for improved performance, for example, if the queries are hybrid with CPU intensive work done locally. To enable thread-based parallelism, you can use [Node worker threads](https://nodejs.org/api/worker_threads.html#worker-threads) with one database connection in each thread. If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. ## Connection pooling If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use, so you can avoid the overhead of creating a new database object for each query. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. For connection pools, we recommend using [generic-pool](https://www.npmjs.com/package/generic-pool) with [duckdb-async](https://www.npmjs.com/package/duckdb-async) and overriding the `release` function to delete a connection if it's been in use for too long to optimize resource usage. First, let's create a file `md_connection_pool.js` to implement the connection pool class. Note that we are adding a new config option, `recycleTimeoutMillis`, that will help us recreate any connections (active or idle) that have been open for a given time. This is different from `idleTimeoutMillis`, which only destroys idle connections. ```javascript import { Database } from "duckdb-async"; import * as genericPool from "generic-pool"; export class RecyclingPool extends genericPool.Pool { constructor(Evictor, Deque, PriorityQueue, factory, options) { super(Evictor, Deque, PriorityQueue, factory, options); // New _config option for when to recycle a non-idle connection this._config['recycleTimeoutMillis'] = (typeof options.recycleTimeoutMillis == 'undefined') ? undefined : parseInt(options.recycleTimeoutMillis); this._config['motherduckToken'] = (typeof options.motherduckToken == 'undefined') ? undefined : options.motherduckToken; console.log('Creating a RecyclingPool'); } release(resource) { const loan = this._resourceLoans.get(resource); const creationTime = typeof loan == 'undefined' ? 0 : loan.pooledResource.creationTime; // If the connection has been in use for longer than the recycleTimeoutMillis, then destroy it instead of releasing it back into the pool. // If that deletion brings the pool size below the min, a new connection will automatically be created within the destroy method. if (new Date(creationTime + this._config.recycleTimeoutMillis) <= new Date()) { return this.destroy(resource); } return super.release(resource); } } ``` You can then create an `MDFactory` class to create the connection in the pool, and use it with `createRecyclingPool` (equivalent to the `createPool` function from `generic-pool`). ```javascript export class MDFactory { constructor(opts) { this.opts = opts } async create() { console.log("Creating a connection"); const connection = await Database.create(`md:my_db?motherduck_token=` + this.opts.motherduckToken); // Run any connection initialization commands here // For example, you can set THREADS = 1 if you want to limit duckdb to run on a single thread await connection.all("SET THREADS='1';"); return connection; } async destroy(connection) { console.log("Destroying a connection"); return connection.close(); } }; export function createRecyclingPool(config) { const factory = new MDFactory(config); return new RecyclingPool(genericPool.DefaultEvictor, genericPool.Deque, genericPool.PriorityQueue, factory, config); } ``` To try out the connection pool, you can create a file `md_connection_pool_test.js` that creates a `RecyclingPool` and submits a list of queries. To create the pool instance, first set the configuration options specified by `generic-pool` and pass them to the `createRecyclingPool` function. You can find the list of options in the [docs](https://www.npmjs.com/package/generic-pool). Below are a few example values that we recommend for using with MotherDuck. ```javascript import { createRecyclingPool } from "./md_connection_pool.js"; // If an idle eviction would bring us below the min pool size, a new connection is made after the eviction const opts = { max: 10, min: 3, // Background idle connection detruction process runs every evictionRunIntervalMillis // We don't want all connections to be evicted at the same time, so only destroy one at a time // Connection must be idle for softIdleTimeoutMillis before it is recycled. // (Additionally, we implemented recycleTimeoutMillis to also recycle active connections.) evictionRunIntervalMillis: 30000, numTestsPerEvictionRun: 1, softIdleTimeoutMillis: 90000, // Do not start to use a connection that is older than 20 minutes old. Recreate it first. // Set this higher than recycleTimeoutMillis below so that recycling will happen proactively rather than delay query execution. idleTimeoutMillis: 1200000, // Before returning resource to pool, check if it has been in existence longer than this timeout and if so, destroy it. // New connections will be added up to the min pool size during the destroy process, so this is proactive rather than reactive. recycleTimeoutMillis: 900000, // We don't want all the connections to recycle at the same time, so let's randomize it slightly. // This number should be smaller than the recycleTimeoutMillis recycleTimeoutJitter: 60000, // This gets your MotherDuck token from an environment variable. motherduckToken: process.env.motherduck_token, }; const myPool = createRecyclingPool(opts); ``` Then, you can use the pool to asynchronously acquire connections from the pool and run a list of queries. ```javascript let promiseArray = []; let queries = ["SELECT 42", "SELECT 'Hello World!'"]; for (let i=0; i < queries.length; i++) { // Promise is resolved once a resource becomes available console.log("Acquire connection from pool"); promiseArray.push(myPool.acquire()); promiseArray[i] .then(async function(client) { console.log("Starting query"); const results = await client.all(queries[i]); console.log("Results: ", results[0]); await new Promise(r => setTimeout(r, 200)); // Delay for testing // Release the connection (or destroy if it exceeds recycleTimeoutMillis) myPool.release(client); }) .catch(function(err) { console.log(err) }); } ``` You can easily create additional connection pools that connect to different MotherDuck databases by changing the MotherDuck token. ```javascript const opts2 = { ...opts, motherduckToken: process.env.motherduck_token_2}; const myPool2 = createRecyclingPool(opts2); ``` To shutdown and stop using a pool, you can optionally run the following code in your application: ```javascript myPool.drain().then(function() { myPool.clear(); }); ``` To test the pool, run: ```bash npm install duckdb-async npm install generic-pool export motherduck_token="" # Add your MotherDuck token here node md_connection_pool_test.js ``` --- --- sidebar_position: 1 title: Multithreading and Parallelism with Python and MotherDuck sidebar_label: Python description: Performance tuning via multithreading with multiple connections to MotherDuck with Python --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Multithreading and parallelism with Python Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create [connections in multiple threads](#connections-in-multiple-threads). For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. ## Connections in multiple threads If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency. :::note Concurrency is supported by DuckDB, across multiple Python threads, as described in the [Multiple Python Threads](https://duckdb.org/docs/guides/python/multiple_threads.html) documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on [Parallelism](https://duckdb.org/docs/guides/performance/how_to_tune_workloads.html#parallelism-multi-core-processing) in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial. ::: A single DuckDB connection [is not thread-safe](https://duckdb.org/docs/api/python/overview.html#using-connections-in-parallel-python-programs). To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the `.cursor()` method to run a query: ```python import duckdb from threading import Thread duckdb_con = duckdb.connect('md:my_db') def query_from_thread(duckdb_con, query): cur = duckdb_con.cursor() result = cur.execute(query).fetchall() print(result) cur.close() queries = ["SELECT 42", "SELECT 'Hello World!'"] threads = [] for i in range(len(queries)): threads.append(Thread(target = query_from_thread, args = (duckdb_con, query,), name = 'query_' + str(i))) for thread in threads: thread.start() for thread in threads: thread.join() ``` ## Connection pooling If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. For connection pools, we recommend using [SQLAlchemy](https://docs.sqlalchemy.org/14/core/pooling.html). Below is an example implementation. For this implementation, you can connect to a user account by providing a `motherduck_token` in your database path. ```python import logging from itertools import cycle from threading import Lock import duckdb import sqlalchemy.pool as pool from sqlalchemy.engine import make_url _log = logging.getLogger(__name__) logging.basicConfig(level=logging.DEBUG) class DuckDBPool(pool.QueuePool): """Connection pool for DuckDB databases (MD or local). When you run con = pool.connect(), it will return a cached copy of one of the database connections in the pool. When you run con.close(), it doesn't close the connection, it just returns it to the pool. Args: database_paths: A list of unique databases to connect to. """ def __init__( self, database_paths, max_overflow=0, timeout=60, reset_on_return=None, *args, **kwargs ): self.database_paths = database_paths self.gen_database_path = cycle(database_paths) self.pool_size = kwargs.pop("pool_size", len(database_paths)) self.lock = Lock() super().__init__( self._next_conn, *args, max_overflow=max_overflow, pool_size=self.pool_size, reset_on_return=reset_on_return, timeout=timeout, **kwargs ) def _next_conn(self): with self.lock: path = next(self.gen_database_path) duckdb_conn = duckdb.connect(path) url = make_url(f"duckdb:///{path}") _log.debug(f"Connected to database: {url.database}") return duckdb_conn ``` ### How to set `database_paths` The `DuckDBPool` takes a list of `database_paths` and an optional input argument `pool_size` (defaults to the number of paths). Each path in the list will get a DuckDB connection in the pool, that readers can use to query the database(s) they connect to. If you have a `pool_size` that is larger than the number of paths, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool. :::note To learn more about database instances and connections, see [Connect to multiple databases](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#connect-to-multiple-databases). ::: To create a connection pool with 3 connections to **the same database**, you can pass a single database path, and set `pool_size=3`: ```python path = "md:my_db?motherduck_token=&access_mode=read_only" conn_pool = DuckDBPool([path], pool_size=3) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **the same database** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding `&user=` to the database path: ```python paths = [ "md:my_db?motherduck_token=&access_mode=read_only&user=1", "md:my_db?motherduck_token=&access_mode=read_only&user=2", "md:my_db?motherduck_token=&access_mode=read_only&user=3", ] conn_pool = DuckDBPool(paths) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **the same MotherDuck account** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```python paths = [ "md:my_db1?motherduck_token=&access_mode=read_only", "md:my_db2?motherduck_token=&access_mode=read_only", "md:my_db3?motherduck_token=&access_mode=read_only", ] conn_pool = DuckDBPool(paths) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **separate MotherDuck accounts** using *different DuckDB instances*. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```python paths = [ "md:my_db1?motherduck_token=&access_mode=read_only", "md:my_db2?motherduck_token=&access_mode=read_only", "md:my_db3?motherduck_token=&access_mode=read_only", ] conn_pool = DuckDBPool(paths) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. ### How to run queries with a thread pool You can then fetch connections from the pool, for example, to run queries from a queue. You can use a `ThreadPoolExecutor` with 3 workers to fetch connections from the pool and run the queries using a `run_query` function: ```python from concurrent.futures import ThreadPoolExecutor def run_query(conn_pool: DuckDBPool, query: str): _log.debug(f"Run query: {query}") conn = conn_pool.connect() rows = conn.execute(query) res = rows.fetchall() conn.close() _log.debug(f"Done running query: {query}") return res with ThreadPoolExecutor(max_workers=3) as executor: conn_pool = DuckDBPool(database_paths) futures = [executor.submit(run_query, conn_pool, query) for query in queries] for future, query in zip(futures, queries): result = future.result() print(f"Query [{query}] num rows: {len(result)}") ``` Reset the connection pool at least once every 24 hours, by closing and reopening all connections. This ensures that you are always running on the latest version of MotherDuck. ```python conn_pool.dispose() conn_pool.recreate() ``` --- --- title: Multithreading and parallelism description: Learn how to use multithreading and parallelism for special cases to read data from MotherDuck --- DuckDB supports two concurrency models: - Single-process read/write where one process can both read and write to the database. - Multi-process read-only (access_mode = 'READ_ONLY') multiple processes can read from the database, but none can write. This approach provides significant performance benefits for analytics databases. You can find more details on how to handle multiple process writes (or multiple read + write connections) in the [DuckDB documentation](https://duckdb.org/docs/stable/connect/concurrency.html). ## Closing Database Instances Python snippets showing how to close database instances are shown below: ```py con = duckdb.connect("md:my_db?cache_buster=123", config={"motherduck_token": my_other_token}) ``` Or you can set the `dbinstance_inactivity_ttl` setting to zero: ```py con = duckdb.connect("md:my_db", config={"motherduck_token": token}) con.sql("SET motherduck_dbinstance_inactivity_ttl='0ms'") ``` Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create connections in multiple threads. For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use connection pooling. Implementation details can be seen in the cards linked below: import DocCardList from '@theme/DocCardList'; --- --- title: Read Scaling description: Learn how to scale your data applications using read scaling tokens --- import Admonition from '@theme/Admonition'; Read-only data applications and business intelligence tools that serve multiple concurrent end users but connect to MotherDuck with a single MotherDuck user account can see poor performance due to resource contention. This is because, by default, MotherDuck assigns all client instances of DuckDB that connect to MotherDuck with the same account to a single DuckDB instance running in the cloud (“duckling”). To solve this problem, applications can create and use a ***Read Scaling Token*** to connect to MotherDuck and transparently access up to 4X compute resources to better handle concurrent query workloads generated by end users. The result is an improved end user experience and read performance for your applications! With read scaling tokens, MotherDuck accounts now support scaling up to 4 replicas of your database that can be read concurrently. When connecting with a read scaling token, each concurrent end user connects to a ***read scaling replica*** of the database that is served by its own duckling. With read scaling enabled, your flock of ducklings grows to ensure each concurrent end user is served by its own duckling, up to the concurrency limit of 4. Beyond this limit, applications can gracefully degrade by having multiple end users be served by the same duckling, while still maintaining affinity between end user and duckling. ## Understanding read scaling tokens MotherDuck offers a special type of personal access token called a **Read Scaling Token*. This token enables scalable read operations while restricting write capabilities. Here's a breakdown of how it works: ### Creating a read scaling token You can generate a read scaling token through the **MotherDuck UI**. When [generating an access token][md-access-token], select "read scaling token" as the token type. ### Permissions and limitations A read scaling token grants permission for **read operations**, such as querying tables, but **restricts write operations**, including: - Updating tables - Creating new databases - Attaching or detaching databases ### How a DuckDB client uses read scaling tokens When a DuckDB client connects to MotherDuck with a read scaling token: - It is assigned to one of the **read scaling replicas** for the user account. - This is in addition to the standard read-write "duckling" that is used without a scaling token. These replicas are **eventually consistent**, meaning data from read operations may briefly lag behind the latest writes. This design supports scaling but does not guarantee real-time access to live data. ### Scaling and Replicas - The **flock of read scaling replicas** can grow to a maximum size of **4**, and can be configured through the **MotherDuck UI**. - Each replica operates independently to handle concurrent read operations. ### Session affinity with `session_hint` To enhance replica affinity, the DuckDB connection string supports the `session_hint` parameter: - Clients with the same `session_hint` value are assigned to the same replica. - This parameter can be set to the ID of a user session, a user ID, or a hashed value for privacy. By leveraging read scaling tokens and `session_hint`, you can efficiently scale read operations and group user sessions to optimize performance. ### Instance caching with `dbinstance_inactivity_ttl` DuckDB integrations (Python, JDBC, R, and others) use an instance cache to ensure that queries to the same database use the same DuckDB instance. This mechanism now includes a Time-to-Live (TTL) feature that keeps instances alive for a period after use: - This improves read scaling by maintaining session affinity across separate queries - The TTL value can be customized in the connection string: `md:?dbinstance_inactivity_ttl=5` (time in minutes) This feature enhances the effectiveness of the `session_hint` parameter by ensuring that frequent queries from the same client are more likely to hit the same duckling, even if there are short gaps between connections. ## Ensuring Data Freshness In read scaling mode, you can expect the read scaling instances to pick up changes from the read-write instance within a few minutes. For most use cases, this default behavior is sufficient, and no additional steps are required. However, if your use case requires immediate synchronization between the read-write instance and the read-scaling instances, you can create snapshots of databases on the writer connection and refresh the corresponding database on read-scaling connections. Creating a snapshot of a database will interrupt any ongoing queries interacting with that database. ### Example Workflow ```sql -- Writer connection CREATE TABLE my_db.my_table AS ...; CREATE SNAPSHOT OF my_db; -- Read-scaling connection REFRESH DATABASES; -- Refreshes all connected databases and shares REFRESH DATABASE my_db; -- Alternatively, refresh a specific database ``` This approach guarantees that readers see the most recent snapshot. Learn more about [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md) and [CREATE SNAPSHOT](/sql-reference/motherduck-sql-reference/create-snapshot.md). ## Best Practices Some prescriptions to get the best user experience out of MotherDuck read scaling features in the present: ### Provision your flock based on number of concurrent end users Assigning each end user their own duckling leverages DuckDB's strengths on a single user workload and will provide an optimal user experience. In practice, applications may choose to accept some graceful degradation due to co-location of multiple users on a duckling in order to bound costs. This can be done by configuring a maximum size for the flock of read replicas based on an expected number of concurrent end users. While we expect the current limit of 4 to support many use cases, we also anticipate that limit to increase over time, enabling further scale out for applications that require it. ### Use local resources where it makes sense WASM allows applications to run a client instance of DuckDB and connect to MotherDuck in the browser. Shifting processing to the client will make better use of local resources, and may scale better to a large number of end users compared to distributing that processing over a relatively small number of ducklings. ### Maintain end user affinity with ducklings Running all queries for a given end user on the same duckling improves user experience due to several factors: query performance benefits from caching, users see more consistent views of data across queries, and there is some measure of isolation between users. To that end, applications should use the session hint mechanism to implement user affinity. In particular, deriving the session hint from end user identity results in all client instances of DuckDB created for a given end user being assigned to the same read replica. [md]: https://motherduck.com/ [md-access-token]: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token --- --- title: Interacting with cloud storage description: Learn how to work with databases and MotherDuck --- import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 5 title: Querying Files in Amazon S3 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Since MotherDuck is hosted in the cloud, one of the benefits of MotherDuck is better and faster interoperability with Amazon S3. MotherDuck's "hybrid mode" automatically routes queries that query Amazon S3 to MotherDuck's execution runtime in the cloud rather than executing these queries locally. :::note MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). ::: MotherDuck supports the [DuckDB dialect](https://duckdb.org/docs/guides/import/s3_import) to query data stored in Amazon S3. Such queries are automatically routed to MotherDuck's cloud execution engines for faster and more efficient execution. Here are some examples of querying data in Amazon S3: ```sql SELECT * FROM read_parquet('s3:///'); SELECT * FROM read_parquet(['s3:///', ... ,'s3:///']); SELECT * FROM read_parquet('s3:///*'); SELECT * FROM 's3:////*'; SELECT * FROM iceberg_scan('s3:///', ALLOW_MOVED_PATHS=true); SELECT * FROM delta_scan('s3:///'); ``` See [Apache Iceberg](/integrations/file-formats/apache-iceberg.mdx) for more information on reading Iceberg data. See [Delta Lake](/integrations/file-formats/delta-lake.mdx) for more information on reading Delta Lake data. ## Accessing private files in Amazon S3 Protected Amazon S3 files require an AWS access key and secret. You can configure MotherDuck using [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) --- --- sidebar_position: 5 title: Writing Data to Amazon S3 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; You can use MotherDuck to transform files on Amazon S3 or export data from MotherDuck to Amazon S3. :::note MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). ::: MotherDuck supports the [DuckDB dialect](https://duckdb.org/docs/guides/import/s3_export) to write data to Amazon S3. MotherDuck will write data in Parquet format. ## Syntax ```sql COPY
TO 's3:///[]/'; ``` ## Example usage ```sql COPY ducks_table to 's3://ducks_bucket/ducks.parquet' ``` --- --- title: Building data apps --- Data apps are interactive tools designed to offer insights or automate actions using data. Examples include data visualizations and custom reporting tools for business groups. You might also come across "embedded analytics." This concept brings the kind of data exploration previously confined to dashboards and traditional BI (Business Intelligence) tools right into the software that teams and customers already use. For an interactive exploration of the capabilities offered by DuckDB Wasm and Motherduck, we invite you to experience our live demo [here](https://motherduckdb.github.io/wasm-client/mosaic-integration/). Please note that while the data visualization serves as a medium, the focus is on the underlying technology's features. ## How to Develop Data Apps? Building data apps involves a blend of data handling, storage, and visualization technologies. The development process usually includes: - **Data Processing**: The initial phase involves gathering, cleaning, and formatting data so it's ready for the app. - **Data Storage**: Processed data is then stored in a database or data warehouse to ensure the app can quickly and efficiently fetch the data when needed. - **Data Visualization**: An essential feature of data apps is visualization ([Streamlit](https://streamlit.io/), [Mosaic](https://uwdata.github.io/mosaic/), [D3js](https://d3js.org/), [Chartjs](https://www.chartjs.org/), etc), enabling users to see and understand data patterns. As data apps are embedded into the software that teams and customers already use, this often involves JavaScript libraries like D3.js, Chart.js, or Mosaic. Note that Python libraries like Streamlit or Dash are also popular for data apps. ![hld](./img/dataapps_hld.png) ### Challenges in Data App Development Creating data apps comes with its hurdles, mainly due to data's complexity. Here are two typical challenges: 1) **Data Source Responsiveness**: Data apps often need real-time data, which can be tough to swiftly process and analyze. 2) **Data Infrastructure Load**: Data apps can demand a lot from data infrastructure, particularly if vast amounts of data need processing and storage. Typically, after it has been processed, the useful data often finds its home in a cloud data warehouse, an OLAP system. Although you could query your OLTP database (such as Postgres), where the original data is often stored, this approach can exert significant query processing pressure on a system not designed for such tasks. OLAP systems like cloud data warehouses performs better than OLTP for the kinds of queries (e.g. filters, aggregations) that data apps need to run. That being said, some applications have latency requirements which make it hard to query a cloud data warehouse directly. To mitigate this, it's common practice to establish a caching layer. With the above proposed architecture, we get the best of both worlds: the data is stored in a cloud data warehouse, and the data app can query a local cache (DuckDB) for low-latency queries. ### Architecture of Data Apps #### MotherDuck's novel Wasm-Powered 1.5-tier architecture DuckDB can run anywhere, including in the browser thanks to WebAssembly (Wasm). [Wasm](https://webassembly.org/) is a technology that lets you run code from languages like Rust or C++ in web browsers, making web apps run faster and more efficient alongside JavaScript. Using DuckDB Wasm, client-side JavaScript can process data locally, enabling faster analytics experiences. Once the data is moved to the local context, many user interactions won't require any communication with the cloud, resulting in lower latency and quicker performance for the user. Additionally, less cloud computing translates to lower costs for the developer. We call 1.5-tier architecture because it combines the client (1-tier) that now has also a part of the database. This mean that the other part of the database is in the cloud, hence the 1.5-tier architecture. ![wasm](./img/wasm-powered-1.5tierarch.png) #### 3-tier architecture A 3-tier architecture powers the vast majority of applications today. Managing integrations and updates between the client, server, and database is time-consuming and unwieldy. For users, multiple steps between them and the data may slow performance and speed at scale. In this case, the server will use a DuckDB client to then connect to Motherduck. ![3tier](./img/traditional-3-tier.png) ### Getting started with data apps and MotherDuck To get started with data apps and MotherDuck, check out our [MotherDuck Wasm client documentation](/key-tasks/data-apps/wasm-client.md). --- # MotherDuck Wasm Client [MotherDuck](https://motherduck.com/) is a managed DuckDB-in-the-cloud service. [DuckDB Wasm](https://github.com/duckdb/duckdb-wasm) brings DuckDB to every browser thanks to WebAssembly. The MotherDuck Wasm Client library enables using MotherDuck through DuckDB Wasm in your own browser applications. ## Examples Example projects and live demos can be found [here](https://github.com/motherduckdb/wasm-client). ## Status Please note that the MotherDuck Wasm Client library is in an early stage of active development. Its structure and API may change considerably. Our current intention is to align more closely with the DuckDB Wasm API in the future, to make using MotherDuck with DuckDB Wasm as easy as possible. ## DuckDB Version Support - The MotherDuck Wasm Client library uses the same version of DuckDB Wasm as the MotherDuck web UI. Since the DuckDB Wasm assets are fetched dynamically, and the MotherDuck web UI is updated weekly and adopts new DuckDB versions promptly, the DuckDB version used could change even without upgrading the MotherDuck Wasm Client library. Check `pragma version` to see which DuckDB version is in use. ## Installation `npm install @motherduck/wasm-client` ## Requirements To faciliate efficient communication across worker threads, the MotherDuck Wasm Client library currently uses advanced browser features, including [SharedArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer). Due to [security requirements](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer#security_requirements) of modern browsers, these features require applications to be [cross-origin isolated](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated). To use the MotherDuck Wasm Client library, your application must be in cross-origin isolation mode, which is enabled when it is served with the following headers: ``` Cross-Origin-Opener-Policy: same-origin Cross-Origin-Embedder-Policy: require-corp ``` You can check whether your application is in this mode by examining the [crossOriginIsolated](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated) property in the browser console. Note that applications in this mode are restricted in [some](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cross-Origin-Opener-Policy#same-origin) [ways](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cross-Origin-Embedder-Policy#require-corp). In particular, resources from different origins can only be loaded if they are served with a [Cross-Origin-Resource-Policy (CORS)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cross-Origin-Resource-Policy) header with the value `cross-origin`. ## Dependencies The MotherDuck Wasm Client library depends on `apache-arrow` as a peer dependency. If you use `npm` version 7 or later to install `@motherduck/wasm-client`, then `apache-arrow` will automatically be installed, if it is not already. If you already have `apache-arrow` installed, then `@motherduck/wasm-client` will use it, as long as it is a compatible version (`^14.0.x` at the time of this writing). Optionally, you can use a variant of `@motherduck/wasm-client` that bundles `apache-arrow` instead of relying on it as a peer dependency. Don't use this option if you are using `apache-arrow` elsewhere in your application, because different copies of this library don't work together. To use this version, change your imports to: ```ts import '@motherduck/wasm-client/with-arrow'; ``` instead of: ```ts import '@motherduck/wasm-client'; ``` ## Usage The MotherDuck Wasm Client library is written in TypeScript and exposes full TypeScript type definitions. These instructions assume you are using it from TypeScript. Once you have installed `@motherduck/wasm-client`, you can import the main class, `MDConnection`, as follows: ```ts import { MDConnection } from '@motherduck/wasm-client'; ``` ### Creating Connections To create a `connection` to a MotherDuck-connected DuckDB instance, call the `create` static method: ```ts const connection = MDConnection.create({ mdToken: token }); ``` The `mdToken` parameter is required and should be set to a valid MotherDuck access token. You can create a MotherDuck access token in the MotherDuck UI. For more information, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). The `create` call returns immediately, but starts the process of loading the DuckDB Wasm assets from `https://app.motherduck.com` and starting the DuckDB Wasm worker. This initialization process happens asynchronously. Any query evaluated before initialization is complete will be queued. To determine whether initialization is complete, call the `isInitialized` method, which returns a promise resolving to `true` when DuckDB Wasm is initialized: ```ts await connection.isInitialized(); ``` Multiple connections can be created. Connections share a DuckDB Wasm instance, so creating subsequent connections will not repeat the initialization process. Queries evaluated on different connections happen concurrently; queries evaluated on the same connection are queued sequentially. ### Evaluating Queries To evaluate a query, call the `evaluateQuery` method on the `connection` object: ```ts try { const result = await connection.evaluateQuery(sql); console.log('query result', result); } catch (err) { console.log('query failed', err); } ``` The `evaluateQuery` method returns a [promise](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Promises) for the result. In an [async function](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/async_function), you can use the `await` syntax as above. Or, you can use the `then` and/or `catch` methods: ```ts connection.evaluateQuery(sql).then((result) => { console.log('query result', result); }).catch((reason) => { console.log('query failed', reason); }); ``` See [Results](#results) below for the structure of the result object. ### Prepared Statements To create a [prepared](https://duckdb.org/docs/api/c/prepared) [statement](https://duckdb.org/docs/api/wasm/query#prepared-statements) for later evaluation, use the `prepareQuery` method: ```ts const prepareResult = await this.prepareQuery('SELECT v + ? FROM generate_series(0, 10000) AS t(v);'); ``` This returns an [AsyncPreparedStatement](https://shell.duckdb.org/docs/classes/index.AsyncPreparedStatement.html), which can be evaluated later using the `send` method: ```ts const arrowStream = await prepareResult.send(234); ``` Note: The `query` method of the AsyncPreparedStatement should not be used, because it can lead to deadlock when combined with the MotherDuck extension. To immediately evaluate a prepared statement, call the `evaluatePreparedStatement` method: ```ts const result = await connection.evaluatePreparedStatement('SELECT v + ? FROM generate_series(0, 10000) AS t(v);', [234]); ``` This returns a materialized result, as described in [Results](#results) below. ### Canceling Queries To evalute a query that can be canceled, use the `enqueueQuery` and `evaluateQueuedQuery` methods: ```ts const queryId = connection.enqueueQuery(sql); const result = await connection.evaluateQueuedQuery(queryId); ``` To cancel a query evaluated in this fashion, use the `cancelQuery` method, passing the `queryId` returned by `enqueueQuery`: ```ts const queryWasCanceled = await connection.cancelQuery(queryId); ``` The `cancelQuery` method returns a promise for a boolean indicating whether the query was successfully canceled. The result promise of a canceled query will be rejected with and error message. The `cancelQuery` method takes an optional second argument for controlling this message: ```ts const queryWasCanceled = await connection.cancelQuery(queryId, 'custom error message'); ``` ### Streaming Results The query methods above return fully materialized results. To evalute a query and return a stream of results, use `evaluateStreamingQuery` or `evaluateStreamingPreparedStatement`: ```ts const result = await connection.evaluateStreamingQuery(sql); ``` See [Results](#results) below for the structure of the result object. ### Error Handling The query result promises returned by `evaluateQuery`, `evaluatePreparedStatement`, `evaluateQueuedQuery`, and `evaluateStreamingQuery` will be rejected in the case of an error. For convenience, "safe" variants of these three method are provided that catch this error and always resolve to a value indicating success or failure. For example: ```ts const result = await connection.safeEvaluateQuery(sql); if (result.status === 'success') { console.log('rows', result.rows); } else { console.log('error', result.err); } ``` ### Results A successful query result may either be fully materialized, or it may contain a stream. Use the `type` property of the result object, which is either `'materialized'` or `'streaming'`, to distinguish these. #### Materialized Results A materialized result contains a `data` property, which provides several methods for getting the results. The number of columns and rows in the result are available through the `columnCount` and `rowCount` properties of `data`. Column names and types can be retrived using the `columnName(columnIndex)` and `columnType(columnIndex)` methods. Individual values can be accessed using the `value(columnIndex, rowIndex)` method. See below for details about the forms values can take. Several convenience methods also simplify common access patterns; see `singleValue()`, `columnNames()`, `deduplicatedColumnNames()`, and `toRows()`. The `toRows()` method is especially useful in many cases. It returns the result as an array of row objects. Each row object has one property per column, named after that column. (Multiple columns with the same name are dedupicated with suffixes.) The type of each column property of a row object depends on the type of the corresponding column in DuckDB. Many values are converted to a JavaScript primitive type, such as `boolean`, `number`, or `string`. Some numeric values too large to fit in a JavaScript `number` (e.g a DuckDB [BIGINT](https://duckdb.org/docs/sql/data_types/numeric#integer-types)) are converted to a JavaScript `bigint`. Some DuckDB types, such as [DATE](https://duckdb.org/docs/sql/data_types/date), [TIME](https://duckdb.org/docs/sql/data_types/time), [TIMESTAMP](https://duckdb.org/docs/sql/data_types/timestamp), and [DECIMAL](https://duckdb.org/docs/sql/data_types/numeric#fixed-point-decimals), are converted to JavaScript objects implementing an interface specific to that type. Nested types such as DuckDB [LIST](https://duckdb.org/docs/sql/data_types/list), [MAP](https://duckdb.org/docs/sql/data_types/map), and [STRUCT](https://duckdb.org/docs/sql/data_types/struct) are also exposed through speical JavaScript objects. These objects all implement `toString` to return a string representation. For primitive, this representation is identical to DuckDB's string conversion (e.g. using [CAST](https://duckdb.org/docs/sql/expressions/cast.html) to VARCHAR). For nested types, the representation is equivalent to the syntax used to construct these types. They also have properties exposing the underlying value. For example, the object for a DuckDB TIME has a `microseconds` property (of type `bigint`). See the TypeScript type definitions for details. Note that these result types differ from those returned by DuckDB Wasm without the MotherDuck Wasm Client library. The MotherDuck Wasm Client library implements custom conversion logic to preserve the full range of some types. #### Streaming Results A streaming result contains three ways to consume the results, `arrowStream`, `dataStream`, and `dataReader`. The first two (`arrowStream` and `dataStream`) implement the async iterator protocol, and return items representing batches of rows, but return different kinds of batch objects. Batches correspond to DuckDB DataChunks, which are no more than 2048 rows. The third (`dataReader`) wraps `dataStream` and makes consuming multiple batches easier. The `dataStream` iterator returns a sequence of `data` objects, each of which implements the same interface as the `data` property of a materialized query result, described above. The `dataReader` implements the same `data` interface, but also adds useful methods such as `readAll` and `readUntil`, which can be used to read at least a given number of rows, possibly across multiple batches. The `arrowStream` property provides access to the underlying Arrow RecordBatch stream reader. This can be useful if you need the underlying Arrow representation. Also, this stream has convenience methods such as `readAll` to materialize all batches. Note, however, that Arrow performs sometimes lossy conversion of the underlying data to JavaScript types for certain DuckDB types, especially dates, times, and decimals. Also, converting Arrow values to strings will not always match DuckDB's string conversion. Note that results of remote queries are not streamed end-to-end yet. Results of remote queries are fully materialized on the client upstream of this API. So the first batch will not be returned from this API until all results have been received by the client. End-to-end streaming of remote query results is on our roadmap. ### DuckDB Wasm API To access the underlying DuckDB Wasm instance, use the `getAsyncDuckDb` function. Note that this function returns (a Promise to) a singleton instance of DuckDB Wasm also used by the MotherDuck Wasm Client. --- --- sidebar_position: 1 title: Github Actions --- # Orchestrating Queries with Github Action GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipeline. You can create workflows that build and test every pull request to your repository, or deploy merged pull requests to production. For the purposes of data warehousing, we can use GitHub Actions to extract, load, and transform data as a simple cron job. You can learn more about [Github Actions on the documentation pages](https://docs.github.com/en/actions). ## Triggering GitHub Actions This How-to guide will cover two invocation examples: Actions invoked via `workflow dispatch` (manually triggered by a button in Github) and via a scheduled job. After reviewing the job invocation methodology, it continues on to show the definition of a container, installation of DuckDB, and then execution of some basic operations in MotherDuck. It should be noted that this is not intended to be a complete document - rather, a narrow slice of useful code that can be directly applied to the types of problems that can be solved with MotherDuck. ### Manually triggered actions The most basic way to use Github Actions is to use `workflow dispatch` so that the action can be triggered by clicking a button in GitHub. Detailed documentation about this can be found on the [Github website](https://docs.github.com/en/actions/managing-workflow-runs-and-deployments/managing-workflow-runs/manually-running-a-workflow). Using `workflow dispatch` in practice looks like this: ```yml name: manual_build on: workflow_dispatch: inputs: name: # Friendly description to be shown in the UI instead of 'name' description: 'What is the reason to trigger this manually?' # Default value if no value is explicitly provided default: 'testing github actions' # Input has to be provided for the workflow to run required: false jobs: ... ``` ### Running cron jobs Many types of jobs are better suited for scheduled orchestration. This can be done with the `schedule` attribute, which will use [traditional cron syntax](https://healthchecks.io/docs/cron/) to determine when to run the job. Using `schedule` can look like this: ```yml name: 'Scheduled Run' on: schedule: - cron: '0 10 * * *' # This line sets the job to run every day at 10am UTC jobs: ... ``` ## Defining Jobs & Steps After invocation method is defined, jobs should be defined. This contains the specific steps required to accomplish the job. For this example, we will define the container, install DuckDB, and then run a script a MotherDuck. Job definition can look like this: ```yml jobs: deploy: name: 'Deploy' runs-on: ubuntu-latest ``` We have now define the Action environment, which is the latest stable version of ubuntu. There are of course other places these can run on, but the ubuntu container is a great starting point because it can also be easily shared with Github Codespaces, which makes testing easier. :::note Github Actions are composable, but for simplicity this guide will not cover how to link actions to each other, or other more advanced steps. This can all be found in the [documentation on Github](https://docs.github.com/en/actions). ::: Afer the Job is defined, we add the steps. Since is yaml, the spacing is important, which why the steps are tabbed over. ```yml steps: # check out master using the "Checkout" action - name: Check out uses: actions/checkout@master # install duckdb binary - name: Install DuckDB run: | wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64 rm duckdb_cli-linux-amd64.zip # run sql script with a specific token - name: Run SQL script env: MOTHERDUCK_TOKEN: ${{ secrets.MOTHERDUCK_TOKEN }} run: ./duckdb < script.sql ``` The example script invoked above looks like this: ```sql -- attach to motherduck ATTACH 'md:'; -- set the database USE my_db; -- create the table if it doesn't exist CREATE TABLE IF NOT EXISTS target ( source VARCHAR(255), timestamp TIMESTAMP ); -- insert a row INSERT INTO target (source, timestamp) VALUES ('github action', CURRENT_TIMESTAMP); ``` ## Other Considerations In order to use this Action as currently written, you will need to create a secret in your repo called MOTHERDUCK_TOKEN with a [token generated from your MotherDuck account](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). ## Handling More Complex Workflows The example on this page covers very simple, single step orchestration. For more complex requirements, please check out our [orchestration partners](https://motherduck.com/ecosystem/?category=Orchestration). An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- --- sidebar_position: 10 title: Flat Files --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import DownloadLink from '@site/src/components/DownloadLink'; # Replicating Flat Files to MotherDuck The goal of this guide is to show users simple examples of loading data from flat file sources into MotherDuck. Examples are shown for both the MotherDuck Web UI and the DuckDB CLI. To install the DuckDB CLI, [check out the instructions first.](/documentation/getting-started/connect-query-from-duckdb-cli.mdx) ## CSV From the UI, follow these steps: 1. Navigate to the **Add Data** section. 2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB. 3. Execute the generated query which will create a table for you. 1. Modify the query as needed to suit the correct Database / Schema / Table name. In the CLI, you can load a CSV file using the `read_csv` function. For example: ### Local File ```sql CREATE TABLE my_table AS SELECT * FROM read_csv('path/to/local_file.csv'); ``` ### S3 File To load from S3, ensure your DuckDB instance is configured with [S3 secrets](/documentation/integrations/cloud-storage/amazon-s3.mdx). Then: ```sql CREATE TABLE my_table AS SELECT * FROM read_csv('s3://bucket-name/path-to-file.csv'); ``` ## JSON From the UI, follow these steps: 1. Navigate to the **Add Data** section. 2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB. 3. Execute the generated query which will create a table for you. 1. Modify the query as needed to suit the correct Database / Schema / Table name. In the CLI, use the `read_json` function to load JSON files. ### Local File ```sql CREATE TABLE my_table AS SELECT * FROM read_json('path/to/local_file.json'); ``` ### S3 File Make sure S3 support is enabled as described in the [S3 secrets documentation](/documentation/integrations/cloud-storage/amazon-s3.mdx). ```sql CREATE TABLE my_table AS SELECT * FROM read_json('s3://bucket-name/path-to-file.json'); ``` ## Parquet From the UI, follow these steps: 1. Navigate to the **Add Data** section. 2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB. 3. Execute the generated query which will create a table for you. 1. Modify the query as needed to suit the correct Database / Schema / Table name. In the CLI, use the `read_parquet` function to load Parquet files. ### Local File ```sql CREATE TABLE my_table AS SELECT * FROM read_parquet('path/to/local_file.parquet'); ``` ### S3 File Ensure S3 support is enabled as described in the [S3 secrets documentation](/documentation/integrations/cloud-storage/amazon-s3.mdx). ```sql CREATE TABLE my_table AS SELECT * FROM read_parquet('s3://bucket-name/path-to-file.parquet'); ``` ## Handling More Complex Workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- --- sidebar_position: 1 title: PostgreSQL --- # Replicating PostgreSQL tables to MotherDuck This page will serve to show basic patterns for using Python to connect to PostgreSQL using the [`postgres_scanner`](https://duckdb.org/docs/extensions/postgres.html), connect to MotherDuck, and then write the data from PostgreSQL into MotherDuck. For more complex replication scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion). If you are looking for the [pg_duckdb extension](https://github.com/duckdb/pg_duckdb), head on over to the [pg_duckdb explainer page](/concepts/pgduckdb). To skip the documentation and look at the entire script, expand the element below:
SQL script ```sql -- install pg extension in DuckDB INSTALL postgres; LOAD postgres; -- attach pg as pg_db ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY); -- connect to MotherDuck ATTACH 'md:my_db'; -- insert data into MotherDuck CREATE OR REPLACE TABLE my_db.main.postgres_table AS SELECT * FROM pg_db.public.some_table ```
## Loading the PostgreSQL Exentsion & Authenticating :::info MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance. ::: The first step to connect to Postgres is to install & load the postgres extension using the [DuckDB CLI](/getting-started/connect-query-from-duckdb-cli): ```sql INSTALL postgres; LOAD postgres; ``` Once this is completed, you can connect to postgres by attaching it to your duckdb session: ```sql ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY); ``` More detailed information can be found on the [DuckDB documentation](https://duckdb.org/docs/extensions/postgres.html#connecting). ## Connecting to MotherDuck & inserting the table Once you are connected to your postgres database, you need to connect to MotherDuck. To learn more about authentication, [go here](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck). ```sql ATTACH 'md:my_db'; ``` Once you have authenticated, you can execute CTAS in SQL to replicate data from postgres into MotherDuck. ```sql CREATE OR REPLACE TABLE my_db.main.postgres_table AS SELECT * FROM pg_db.public.some_table ``` Congratulations! You have now replicated data from Postgres into MotherDuck. ## Handling More Complex Workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- --- sidebar_position: 20 title: Spreadsheets --- # Replicating Spreadsheets to MotherDuck Key bits of data and side schedules often exist in spreadsheets. It is nice to be able to easily add that data to your data warehouse and query it. This guide aims to show you how to perform this workflow using the DuckDB CLI. :::tip In order use these extensions, you will need to first install the DuckDB CLI. [Instructions can be found here.](/documentation/getting-started/connect-query-from-duckdb-cli.mdx) ::: ## Microsoft Excel :::note The purpose of this guide is to show you how to load data from Excel into MotherDuck. [Detailed documentation on loading xlsx files can be found on DuckDB.org](https://duckdb.org/docs/guides/file_formats/excel_import.html). ::: To read from an Excel spreadsheet, first install and load the `spatial` extension. Do not use the `excel` extension, which serves an entirely different purpose. The SQL to do this is below: ```sql INSTALL spatial; LOAD spatial; ``` This installs the `st_read` function, so then you can query an Excel file with the path, for example: ```sql SELECT * FROM st_read('myfile.xlsx', layer = 'Sheet1'); ``` More typically, you will pass the fully qualified file name to the `st_read` function. Getting the qualified name depends on your operating system: - Windows: hold Shift and right-click the file or folder, then select "Copy as Path" from the context menu. - Mac OS: right-click the file or folder, press and hold "Option" and then select "Copy ... as Pathname" from the context menu. You can then paste that into your SQL query, as per below: ```sql SELECT * FROM st_read("C:\users\sql_user\documents\myfile.xlsx", layer = 'Sheet1'); ``` The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS). ```sql CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name SELECT * FROM st_read("C:\users\documents\myfile.xlsx", layer = 'Sheet1'); ``` Of course, sometimes there is data in multiple tabs. In that case, you can use the `layer` parameter to pass the tab names, and depending on the context, even union multiple tabs into a single table. ```sql CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name SELECT * FROM st_read("C:\users\documents\myfile.xlsx", layer = 'Sheet1') UNION ALL SELECT * FROM st_read("C:\users\documents\myfile.xlsx", layer = 'Sheet2'); ``` ## Google Sheets The first step to handle Google Sheets is to install the [duckdb-gsheets](https://duckdb-gsheets.com/) extension. That is done with these commands from the DuckDB CLI: ```sql INSTALL gsheets FROM community; LOAD gsheets; ``` Since Google Sheets is a hosted application, we need to use [DuckDB Secrets](https://duckdb.org/docs/configuration/secrets_manager.html) to handle authentication. This is as simple as: ```sql CREATE SECRET (TYPE gsheet); ``` :::note Using this workflow will require interactivity with a browser, so if you need to run it from a job (i.e. Airflow or similar), consider setting up a [Google API access token](https://duckdb-gsheets.com/#getting-a-google-api-access-token). ::: In order to read from a Google Sheet, we need at minimum the sheet id, which is found in the URL, for example `https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit`. The string between `d/` and `/edit` represents the spreadsheet id. It can therefore be queried with: ```sql SELECT * FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit'); ``` The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS). ```sql CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name SELECT * FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit'); ``` For convenience, the spreadsheet id itself can be queried as well. ```sql SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8'); ``` To query data from multiple tabs, the tab name can be passed as parameter using `sheet` to select the preferred tab. ```sql SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8', sheet='Sheet2'); ``` For more detailed documentation, including writing to Google Sheets, review the [duckdb-gsheets documentation](https://duckdb-gsheets.com/#getting-a-google-api-access-token). ## Handling More Complex Workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- --- sidebar_position: 2 title: SQL Server --- # Replicating SQL Server tables to MotherDuck This page will serve to show basic patterns for using Python to connect to SQL Server, read data into a dataframe, connect to MotherDuck, and then writing the data from the dataframe into MotherDuck. For more complex replication scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion). To skip the documentation and look at the entire script, expand the element below:
Python script ```py import pyodbc # Define your connection parameters server = 'ip_address' database = 'master' # or use your database name username = 'your_username' password = 'your_password' # consider using a secret manager or .env port = 1433 # default SQL Server port # Define the connection string for ODBC Driver 17 connection_string = ( f"DRIVER={{ODBC Driver 17 for SQL Server}};" f"SERVER={server},{port};" f"DATABASE={database};" f"UID={username};" f"PWD={password};" ) # Connect to SQL Server try: connection = pyodbc.connect(connection_string) print("Connection successful.") except pyodbc.Error as e: print(f"Error: {e}") finally: connection.close() import pandas as pd try: connection = pyodbc.connect(connection_string) query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials" # Execute the query using pyodbc cursor = connection.cursor() cursor.execute(query) # Fetch the column names and data columns = [column[0] for column in cursor.description] data = cursor.fetchall() # Convert the data into a DataFrame df = pd.DataFrame.from_records(data, columns=columns) finally: connection.close() import duckdb motherduck_token = 'your_token' # Attach using the MOTHERDUCK_TOKEN duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'") # Create or replace table in the attached database duckdb.sql( """ CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS SELECT * FROM df """ ) ```
## SQL Server Authentication SQL Server supports [multiple methods of authentication](https://learn.microsoft.com/en-us/sql/relational-databases/security/choose-an-authentication-mode?view=sql-server-ver16) - for the purpose of this example, we will use username/password authentication and [pyodbc](https://github.com/mkleehammer/pyodbc/), along with [ODBC Driver 17 for SQL Server](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16). It should be noted that 'ODBC Driver 18 for SQL Server' is also available and includes support for some newer SQL Server features, but for the sake of compatibility, this example will use 17. Consider the following authentication example: ```py import pyodbc # Define your connection parameters server = 'ip_address' database = 'master' # or use your database name username = 'your_username' password = 'your_password' # consider using a secret manager or .env port = 1433 # default SQL Server port # Define the connection string for ODBC Driver 17 connection_string = ( f"DRIVER={{ODBC Driver 17 for SQL Server}};" f"SERVER={server},{port};" f"DATABASE={database};" f"UID={username};" f"PWD={password};" ) # Connect to SQL Server try: connection = pyodbc.connect(connection_string) print("Connection successful.") except pyodbc.Error as e: print(f"Error: {e}") finally: connection.close() ``` This will set your credentials, and then attempt to connect to your server with `pyodbc.connect`, and return an error if it fails. ## Reading a SQL Server table into a dataframe Once you have authenticated, you can define arbitrary queries and then execute them with `pd.read_sql`, using the `query` and `connection` objects. For the purpose of this example, we are using SQL Server 2022 along with the AdventureWorks OLTP database. :::note While `pandas` is a great library, it is not particularly well-suited for very large tables. To learn more about using buffers and alternative libraries, check out [this link](/key-tasks/loading-data-into-motherduck/loading-data-md-python/). ::: ```py import pandas as pd try: connection = pyodbc.connect(connection_string) query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials" # Execute the query using pyodbc cursor = connection.cursor() cursor.execute(query) # Fetch the column names and data columns = [column[0] for column in cursor.description] data = cursor.fetchall() # Convert the data into a DataFrame df = pd.DataFrame.from_records(data, columns=columns) finally: connection.close() ``` ## Inserting the table into MotherDuck Now that the data has been loaded into a dataframe object, we can connect to MotherDuck and insert the table. :::note You will need to [generate a token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token) in your MotherDuck account. For production use cases, make sure to use a secret manager and never commit your token to your codebase. ::: ```py import duckdb motherduck_token = 'your_token' # Attach using the MOTHERDUCK_TOKEN duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'") # Create or replace table in the attached database duckdb.sql( """ CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS SELECT * FROM df """ ) ``` This will create the table, or replace it for the table already exists. ## Handling More Complex Workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- --- title: Data Warehousing How-to description: Data Warehousing How-to guides --- import DocCardList from '@theme/DocCardList'; ## What is a Data Warehouse? A data warehouse is used for storing and analyzing data from multiple sources into one place. It uses a common query language (SQL) and is often the jumping off point for reporting, analytics, and supporting strategic decision making. The data warehouse serves as the bridge from raw data to a governed, scalable data set to serve downstream consumers. While DuckDB is excellent at processing and serving large datasets, MotherDuck adds the missing components to make it a true data warehouse. ![Architecture](./../../img/the-md-dwh.png) Some common tools in a data stack are: - BI tools for data visualization and reporting - Omni, Tableau, PowerBI - Ingestion tools to load data in from business apps across your enterprise - Fivetran, Airbyte, Dlthub - Transformation tools to make the data more usable - dbt, sqlmesh, paradime.io - Orchestration tools to stitch it all together - Airflow, Kestra, Dagster Some of these groups of tools have specific pages, which are linked here: Please do not hesitate to **[contact us](https://motherduck.com/customer-support/)** if you need help along your journey. --- --- sidebar_position: 1 title: Basics database operations --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; While embedded DuckDB uses files on your local filesystem to represent databases, MotherDuck implements SQL syntax for creating, listing and dropping databases. ## Create database ```sql -- [OR REPLACE] and [IF NOT EXISTS] are optional modifiers. CREATE [OR REPLACE | IF NOT EXISTS] DATABASE ; USE ; ``` Creating copies of databases in MotherDuck in this manner is a metadata-only operation that copies no data. ## Listing databases ```sql -- returns all connected local and remote databases SHOW DATABASES; -- returns current database SELECT current_database(); ``` ## Delete database ```sql USE ; DROP DATABASE ; ``` Example usage: ```sql > SHOW DATABASES; test01 -- Let's put two different t1 tables into into two different databases > CREATE TABLE dbname.t1 AS (SELECT range AS r FROM range(12)); > SELECT * FROM t1; -- now for the other database > CREATE DATABASE test02; > CREATE TABLE test02.t1 AS (SELECT 'test02' AS dbname) -- show the databases we've created > SHOW DATABASES; test01 test02 ``` --- --- title: Database operations description: Learn how to work with databases and MotherDuck --- import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 12 title: Detach and re-attach a MotherDuck database --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; After [creating a remote MotherDuck database](/sql-reference/motherduck-sql-reference/create-database.md), the [`DETACH ` command](/sql-reference/motherduck-sql-reference/detach-database.md) may be used to detach it. This will prevent access and modifications to the database until it is re-attached using the [`ATTACH ` command](/sql-reference/motherduck-sql-reference/attach-database.md). This pattern can be used to isolate queries and changes to a specific set of databases. Note that this is a convenience feature and not a security feature, as a MotherDuck database may be reattached at any time. Database shares behave slightly differently than non-shared databases, so if you want to `ATTACH` and `DETACH` shares, please have a look at how to [manage shared MotherDuck databases](/key-tasks/sharing-data/sharing-data.mdx). ## Creating, detaching, and re-attaching a database This guide will show how to `CREATE`, `DETACH`, and `ATTACH` a database using the CLI and the UI. ```sql CREATE DATABASE my_new_md_database; DETACH my_new_md_database; ATTACH 'my_new_md_database'; -- OR ATTACH 'md:my_new_md_database'; ``` To create a database, add a new cell and enter the SQL command `CREATE DATABASE `. Click the Run button. ![create_database](./img/create_database.png) Click on the menu of the database you would like to detach and select `Detach`. ![detach_database](./img/detach_database.png) The database will be moved to the "Detached Databases" section of the object explorer. ![detached_databases](./img/detached_databases.png) To re-attach, click on the menu of the database in the "Detached Databases" section and select `Attach`. ![attach_database](./img/attach_database.png) The database will be returned to the "My Databases" section. ![my_databases_post_attach](./img/my_databases_post_attach.png) ## Show All Databases To see all databases, both attached and detached, use the [`SHOW ALL DATABASES` command](/sql-reference/motherduck-sql-reference/show-databases.md). ```sql SHOW ALL DATABASES; ``` Example output: ```bash ┌──────────────────────────────────────────┬─────────────┬──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐ │ alias │ is_attached │ type │ fully_qualified_name │ │ varchar │ boolean │ varchar │ varchar │ ├──────────────────────────────────────────┼─────────────┼──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤ │ TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ true │ motherduck │ md:TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ │ TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ true │ motherduck │ md:TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ │ base │ false │ motherduck │ md:base │ │ base2 │ true │ motherduck │ md:base2 │ │ db1 │ false │ motherduck │ md:db1 │ │ integration_test_001 │ false │ motherduck │ md:integration_test_001 │ │ my_db │ true │ motherduck │ md:my_db │ │ my_share_1 │ true │ motherduck share │ md:_share/integration_test_001/18d6dbdb-e130-4cdf-97c4-60782ed5972b │ │ sample_data │ false │ motherduck │ md:sample_data │ │ source_db │ true │ motherduck │ md:source_db │ │ test_db_115 │ false │ motherduck │ md:test_db_115 │ │ test_db_28d │ false │ motherduck │ md:test_db_28d │ │ test_db_cc9 │ false │ motherduck │ md:test_db_cc9 │ │ test_share │ true │ motherduck share │ md:_share/source_db/b990b424-2f9a-477a-b216-680a22c3f43f │ │ test_share_002 │ true │ motherduck share │ md:_share/integration_test_001/06cc5500-e49a-4f62-9203-105e89a4b8ae │ ├──────────────────────────────────────────┴─────────────┴──────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┤ │ 15 rows (15 shown) 4 columns │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- --- sidebar_position: 2.2 title: Specifying different databases --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck enables you to specify an active/current database and an active/current schema within that database. Queryable objects (e.g. tables) that belong to the current database are resolved with just ``. MotherDuck will automatically search all schemas within the current database. If there are overlapping names within different schemas, objects can be qualified with `.`. Queryable objects in your account outside of the active/current database are resolved with `.`. However, if a schema in the current database shares the same name as another database, the fully qualified name must be used: `..` (an error will be thrown to indicate the ambiguity). This applies to databases that both live in MotherDuck and in your local DuckDB environment. For example: ```sql -- check your current database SELECT current_database(); dbname -- check your current schema SELECT current_schema(); main -- query a table mytable that exists in the current database dbname SELECT count(*) FROM mytable; 34 -- query a table mytable2 that exists in the database dbname2 SELECT count(*) FROM dbname2.mytable2; 41 -- query a table mytable3 that exists in schema2 -- note that the syntax is identical to the database name syntax above and -- MotherDuck will detect whether a database or schema is involved SELECT count(*) FROM schema2.mytable3 42 -- query a table in another database when a schema exists with the same name in the current database -- (overlappingname is both a database name and a schema name) SELECT count(*) FROM overlappingname.myschemaname.mytable4 43 ``` You can also reference local databases in the same MotherDuck queries. This type of query is known as a [hybrid query](/key-tasks/running-hybrid-queries.md). To change the active database, schema, or database/schema combination, execute a `USE` command. See the documentation on [switching the current database](./switching-the-current-database.md) for details. --- --- sidebar_position: 3 title: Switching the current database --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Below are examples of how to determine the current/active database and schema and switch between different databases and schemas: ```sql -- check your current database SELECT current_database(); dbname -- list all tables in the current database SHOW TABLES; table1 table2 -- list all databases SHOW DATABASES; dbname dbname2 -- switch to database named 'dbname2' USE dbname2; -- verify that you've successfully switched databases SELECT current_database(); dbname2 -- check your current schema SELECT current_schema(); main -- list all schemas across all databases SELECT * FROM duckdb_schemas(); ``` | oid | database_name | database_oid | schema_name | internal | sql | |------|---------------|--------------|--------------------|----------|------| | 986 | my_db | 989 | information_schema | true | NULL | | 974 | my_db | 989 | main | false | NULL | | 972 | my_db | 989 | my_schema | false | NULL | | 987 | my_db | 989 | pg_catalog | true | NULL | | 1508 | system | 0 | information_schema | true | NULL | | 0 | system | 0 | main | true | NULL | | 1509 | system | 0 | pg_catalog | true | NULL | | 1510 | temp | 1453 | information_schema | true | NULL | | 1454 | temp | 1453 | main | true | NULL | | 1511 | temp | 1453 | pg_catalog | true | NULL | ```sql -- switch to schema my_schema within the same database USE my_schema; -- verify that you've successfully switched schemas SELECT current_schema(); my_schema -- switch to database my_db and schema main USE my_db.my_schema -- verify that both the database and schema have been changed SELECT current_database(), current_schema(); ``` | current_database() | current_schema() | |--------------------|------------------| | my_db | main | --- --- title: How-to guides sidebar_class_name: how-to-guide-icon description: How-to guides --- import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 2 title: From Cloud Storage or over HTTPS --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # From Cloud Storage or over HTTPS # From Public Cloud Storage MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). :::note MotherDuck is currently hosted in Amazon AWS region `us-east-1`. We strongly encourage you locate your data in this availability zone for working with MotherDuck. ::: The following example features Amazon S3. Connect to MotherDuck if you haven't already by doing the following: ```sql -- assuming the db my_db exists ATTACH 'md:my_db'; ``` ```sql -- CTAS a table from a publicly available demo dataset stored in s3 CREATE OR REPLACE TABLE pypi_small AS SELECT * FROM 's3://motherduck-demo/pypi.small.parquet'; -- JOIN the demo dataset against a larger table to find the most common duplicate urls -- Note you can directly refer to the url as a table! SELECT pypi_small.url, COUNT(*) FROM pypi_small JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi ON pypi_small.url = s3_pypi.url GROUP BY pypi_small.url ORDER BY COUNT(*) DESC LIMIT 10; ``` ## From a Secure Cloud Storage Provider MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'us-east-1' ); -- Now you can query from a secure S3 bucket CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...'; ``` ## Over HTTPS MotherDuck supports loading data over HTTPS. ```sql -- Reads the Central Park Squirrel Data SELECT * FROM read_csv_auto('https://docs.google.com/spreadsheets/d/e/2PACX-1vQUZR6ikwZBRXWWQsFaUceEiYzJiVw4OQNGtwGBfcMfVatpCyfxxaWPdoKJIHlwNM-ow1oeW_2F-pO5/pub?gid=2035607922&single=true&output=csv'); ``` --- --- sidebar_position: 0.9 title: From Your Local Machine description: Moving data from local to MotherDuck through the UI or programmatically. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Single file Using the CLI, you can connect to MotherDuck, create a database, and load a single local file (JSON, Parquet, CSV, etc.) to a MotherDuck table. First, connect to MotherDuck using the `ATTACH` command. ```sql ATTACH 'md:'; ``` Create a cloud database (or point to any existing one) and load a local file into a table. ```sql CREATE DATABASE test01; USE test01; CREATE OR REPLACE TABLE orders as SELECT * from 'orders.csv'; ``` In the MotherDuck UI, you can add JSON, CSV or Parquet file directly using the **Add Files** button in the top left of the UI. See the [Getting Started Tutorial](../../../getting-started/e2e-tutorial#loading-your-dataset) for details. ## Multiple files or database To upload multiple files at once, or data in other formats supported by DuckDB, you can use the DuckDB CLI or any other supported [DuckDB client](https://duckdb.org/docs/data/multiple_files/overview.html). If your all your files reside from a single table, you can use the [glob syntax to load all files into a single table](https://duckdb.org/docs/data/multiple_files/overview.html). For example, to load all CSV files from a directory into a single table, you can use the following SQL command: ```sql ATTACH 'md:'; CREATE DATABASE test01; USE test01; CREATE OR REPLACE TABLE orders as SELECT * from 'dir/*.csv'; ``` If your files are in different formats or you want to load them into different tables, you can first load the files into different tables in a local DuckDB database and then copy the entire database into MotherDuck. To copy the entire local DuckDB database into MotherDuck, you can use the following SQL commands: ```sql ATTACH 'md:'; ``` ```sql ATTACH 'local.ddb'; CREATE DATABASE cloud_db from 'local.ddb'; ``` --- --- sidebar_position: 11 title: From a PostgreSQL or MySQL Database description: Learn to load a table from your PostgreSQL or MySQL database into MotherDuck. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Using PostgresSQL or MySQL DuckDB Extensions DuckDB's [PostgreSQL extension](https://duckdb.org/docs/extensions/postgres.html) and [MySQL extension](https://duckdb.org/docs/extensions/mysql.html) makes it extremely easy to connect to and access data stored in your OLTP databases. Once connected, you can just as easily export the data to MotherDuck to offload analytical queries while benefiting from data centralization, persistence, and data sharing capabilities. In this guide we will demonstrate this workflow with the PostgreSQL extension. Consult the [DuckDB MySQL extension documentation](https://duckdb.org/docs/extensions/mysql) to make adjustments to the steps to work with MySQL databases. :::info MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance. ::: ### Prerequisites - **PostgreSQL Database Credentials**: Ensure you have access details to the PostgreSQL database, including host address, port, and user credentials. You can put the user credentials in the [PostgreSQL Password File](https://www.postgresql.org/docs/current/libpq-pgpass.html), [store them in environment variables](https://duckdb.org/docs/extensions/postgres.html#configuring-via-environment-variables), or pass them inline in the script below. - **Network Connectivity**: Your machine must be able to connect to the target PostgreSQL database. - **MotherDuck Credentials**: MotherDuck credentials should be set up. If not, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md). - **DuckDB**: Either the DuckDB command-line interface or Python + the DuckDB package should be installed and operational. See the [Getting Started tutorials](../../getting-started/getting-started.mdx) for instructions to install DuckDB. ### Steps The following SQL script reads from a table in the PostgreSQL database and write it to the table named `my_db.pg_data_schema.first_pg_table` in MotherDuck. Fill in the placeholders ``, ``, ``, ``, ``, `
`, and `` with the appropriate values and save the script to a file, e.g., `ingest_data_from_postgres.sql`. ```sql -- Connect to a MotherDuck database. ATTACH 'md:'; USE 'my_db'; -- Optionally create a schema, by default MotherDuck uses the main schema; CREATE SCHEMA IF NOT EXISTS pg_data_schema; -- Ingest data from PostgreSQL to a MotherDuck table CREATE OR REPLACE TABLE pg_data_schema.first_pg_table AS SELECT * FROM postgres_scan('dbname= host= user= password= connect_timeout=10', '', '
') -- optionally limit the number of rows ingested LIMIT ; -- Optional: Verify the number of rows in the MotherDuck table SELECT count(1) FROM pg_data_schema.first_pg_table; ``` #### Run with DuckDB CLI After filling out the placeholders, you can either execute the statements line by line in the DuckDB CLI, or save the commands in a file, e.g., `ingest_data_from_postgres.sql`, and run: ```sh > duckdb < ingest_data_from_postgres.sql ``` #### Run with Python You can also execute it using Python with the DuckDB package. ```python import duckdb with open("ingest_data_from_postgres.sql", 'r') as f: s = f.read() duckdb.sql(s) ``` After completing these steps, you should see the new table show up in the MotherDuck Web UI. ## Using a MotherDuck integration Partners MotherDuck collaborates with various integration partners to facilitate data transfer in diverse ways—including change data capture (CDC)—from your PostgreSQL or MySQL database to MotherDuck. For example, you can refer to our [Estuary guide](https://motherduck.com/blog/streaming-data-to-motherduck/) that demonstrates how to stream data from Neon, a PostgreSQL-based database, to MotherDuck. To explore the full range of solutions tailored to your needs, visit our [MotherDuck ecosystem partners page](https://motherduck.com/ecosystem/). --- --- title: Loading Data into MotherDuck description: Learn how to load data into MotherDuck from various sources --- You can leverage MotherDuck’s managed storage to persist your data. MotherDuck storage provides a high level of manageability and abstraction, optimizing your data for secure, durable, performant, and efficient use. There are several ways to load data into MotherDuck storage. import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 1 title: Loading data to MotherDuck with Python --- # Loading data to MotherDuck with Python As you ingest data using Python, typically coming from API or other sources, you have different options to load data to MotherDuck. 1. (fast) Using a Pandas/Polars/PyArrow dataframe as an in memory buffer before bulk loading to MotherDuck. 2. (fast) Write to a temporary file and load it to MotherDuck using a `COPY` command. 3. (slow) Using `executemany` method to perform serveral `INSERT` statements in a single transaction and load data to MotherDuck. Option `1` is the easiest as dataframe libraries are optimized for bulk insert. Option `2` involve writing to disk but `COPY` command is faster than `INSERT` statement. Option `3` should be discouraged unless data is very small (< 500 rows). :::tip No matter which options you are picking, we recommend loading data in chunks (typically `100k` rows to match row group size) to avoid memory issues and making sure your transaction is not too large, typically finishing around a minute maximum. ::: :::info Next to the below recommendation, we suggest reading our guidelines around [connections](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md) and [threading](/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python.md) which will help you to optimize your data loading process. ::: ## 1. Using Pandas/Polars/PyArrow to load data to MotherDuck When using a dataframe library you can load data to MotherDuck in a single transaction. ```python import duckdb import pyarrow as pa # Create a PyArrow table data = { 'id': [1, 2, 3, 4, 5], 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'] } arrow_table = pa.table(data) con = duckdb.connect('md:') con.sql('CREATE TABLE my_table (id INTEGER, name VARCHAR) as SELECT * FROM arrow_table') ``` ### Buffering data When you have a large dataset, it's recommended you chunk your data and load it in batches. This will help you to avoid memory issues and make sure your transaction is not too large. Here's a class example to load data in chunks using PyArrow and DuckDB. ```python import duckdb import os import pyarrow as pa import logging # Setup basic configuration for logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') class ArrowTableLoadingBuffer: def __init__( self, duckdb_schema: str, pyarrow_schema: pa.Schema, database_name: str, table_name: str, destination="local", chunk_size: int = 100_000, # Default chunk size ): self.duckdb_schema = duckdb_schema self.pyarrow_schema = pyarrow_schema self.database_name = database_name self.table_name = table_name self.total_inserted = 0 self.conn = self.initialize_connection(destination, duckdb_schema) self.chunk_size = chunk_size def initialize_connection(self, destination, sql): if destination == "md": logging.info("Connecting to MotherDuck...") if not os.environ.get("motherduck_token"): raise ValueError( "MotherDuck token is required. Set the environment variable 'MOTHERDUCK_TOKEN'." ) conn = duckdb.connect("md:") logging.info( f"Creating database {self.database_name} if it doesn't exist" ) conn.execute(f"CREATE DATABASE IF NOT EXISTS {self.database_name}") conn.execute(f"USE {self.database_name}") else: conn = duckdb.connect(database=f"{self.database_name}.db") conn.execute(sql) # Execute schema setup on initialization return conn def insert(self, table: pa.Table): total_rows = table.num_rows for batch_start in range(0, total_rows, self.chunk_size): batch_end = min(batch_start + self.chunk_size, total_rows) chunk = table.slice(batch_start, batch_end - batch_start) self.insert_chunk(chunk) logging.info(f"Inserted chunk {batch_start} to {batch_end}") self.total_inserted += total_rows logging.info(f"Total inserted: {self.total_inserted} rows") def insert_chunk(self, chunk: pa.Table): self.conn.register("buffer_table", chunk) insert_query = f"INSERT INTO {self.table_name} SELECT * FROM buffer_table" self.conn.execute(insert_query) self.conn.unregister("buffer_table") ``` Using the above class, you can load your data in chunks. ```python import pyarrow as pa # Define the explicit PyArrow schema pyarrow_schema = pa.schema([ ('id', pa.int32()), ('name', pa.string()) ]) # Sample data to create a PyArrow table based on the schema data = { 'id': [1, 2, 3, 4, 5], 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'] } arrow_table = pa.table(data, schema=pyarrow_schema) # Define the DuckDB schema as a DDL statement duckdb_schema = "CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)" # Initialize the loading buffer loader = ArrowTableLoadingBuffer( duckdb_schema=duckdb_schema, pyarrow_schema=pyarrow_schema, database_name="my_db", # The DuckDB database filename or MotherDuck database name table_name="my_table", # The name of the table in DuckDB or MotherDuck destination="md", # Set "md" for MotherDuck or "local" for a local DuckDB database chunk_size=2 # Example chunk size for illustration ) # Load the data loader.insert(arrow_table) ``` ### Typing your dataset When working with production pipeline, it's recommended to type your dataset to avoid any issues with inference. Pyarrow is our recommendation to type your dataset as it's the easiest way to type your dataset, especially for complex data types. In the above example, the schema is defined explicitly on both the PyArrow table and the DuckDB schema. ```python # Initialize the loading buffer loader = ArrowTableLoadingBuffer( duckdb_schema=duckdb_schema, # prepare a DuckDB DDL statement pyarrow_schema=pyarrow_schema, # define explictely your PyArrow schema database_name="my_db", table_name="my_table", destination="md", chunk_size=2 ) ``` ## 2. Write to a temporary file and load it to MotherDuck using a `COPY` command When you have a large dataset, another method is to write your data to temporary files and load it to MotherDuck using a `COPY` command. This also works great if you have existing data on a blob storage like AWS S3, Google Cloud Storage or Azure Blob Storage as you will benefit from cloud network speed. ```python import pyarrow as pa import pyarrow.parquet as pq import duckdb import os # Step 1: Define the schema and create a large PyArrow table schema = pa.schema([ ('id', pa.int32()), ('name', pa.string()) ]) # Example data - multiply the data to simulate a large dataset data = { 'id': list(range(1, 1000001)), # Simulating 1 million rows 'name': ['Name_' + str(i) for i in range(1, 1000001)] } # Create the PyArrow table with the schema large_table = pa.table(data, schema=schema) # Step 2: Write the large PyArrow table to a Parquet file parquet_file = "large_data.parquet" pq.write_table(large_table, parquet_file) # Step 3: Load the Parquet file into MotherDuck using the COPY command conn = duckdb.connect("md:") # Connect to MotherDuck conn.execute("CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)") # Use the COPY command to load the Parquet file into MotherDuck conn.execute(f"COPY my_table FROM '{os.path.abspath(parquet_file)}' (FORMAT 'parquet')") print("Data successfully loaded into MotherDuck") ``` --- --- sidebar_position: 4 title: Load a DuckDB database into MotherDuck --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck supports uploading local DuckDB databases in the cloud as referenced by the [CREATE DATABASE](/sql-reference/motherduck-sql-reference/create-database.md) statement. To create a remote database from the current active local database, execute the following command : ```sql CREATE OR REPLACE DATABASE remote_database_name FROM CURRENT_DATABASE(); ``` To upload a different local named database, execute the following command : ```sql CREATE OR REPLACE DATABASE remote_database_name FROM ''; ``` Here's a full end-to-end example: ```sql -- Let's generate some data based on the tpch extension (will be automatically autoloaded). -- This will create a couple of tables in the current database. CALL dbgen(sf=0.1); -- Connect to MotherDuck ATTACH 'md:'; CREATE OR REPLACE DATABASE remote_tpch from CURRENT_DATABASE(); ``` :::note Uploading database does not alter context, meaning you are still in the local context after the upload and the query will run locally. ::: --- --- title: Managing Organizations description: Learn how to manage your Organization with MotherDuck --- An Organization is a top-level entity in MotherDuck that enables you to perform administrative functions, such as managing users, setting up billing, configuring sharing, monitoring security, and so on. A MotherDuck user can only belong to a single Organization at a time. Currently, Organizations are helpful for: - Grouping users together for tracking usage and billing. - Sharing data with other users in an organization. ## Creating an Organization If you already have a MotherDuck account, an Organization was already created for you by MotherDuck. If you are a new MotherDuck user, during sign-up you will be prompted to create a new Organization. ![create_org](./img/create_org.png) :::note If another coworker at your company already has an organization, you can create your own organization to get started with MotherDuck right away, and then ask them to invite you to their organization later (See ["Joining an Existing Organization"](#joining-an-existing-organization) below). ::: ## Inviting Users to Your Organization You can check if your teammates are in your Organization by navigating to the MotherDuck Web UI -> "Settings" -> "Members". There you may also invite your teammates to join your Organization. You may invite both teammates without a MotherDuck account and existing MotherDuck users. ![members](./img/members.png) ## Enabling All Users in a Domain to Join Your Organization You can enable all users with an email address in the same domain as yours to join your MotherDuck Organization. This is accomplished by navigating to the MotherDuck Web UI -> "Settings" -> "Organization". ![all in domain flag](./img/all_in_domain.jpg) This setting is currently available for all users in the Organization to set and is based on the domain of the user selecting this option. Common public email hosts (eg gmail.com) are not supported. ## Joining an Existing Organization If you'd like to join your teammates' existing MotherDuck Organization, you must be invited by an Administrator in that Organization. Once an invite is generated, you will receive an email with a link to join the Organization. Alternatively, the Organization can allow [all users in a specific email domain](#enabling-all-users-in-a-domain-to-join-your-organization) to join. ## Roles Within an Organization a user can have an "Admin" or "Member" role. The first user in an organization shall be the "Admin" and subsequent users shall have the "Member" role. "Admin" users can change the roles of other users in the organization or "Remove" a user from the organization. :::note In the future sending invitations, changing between plans, or updating billing information will require an "Admin" role. ::: ## Removing Users If a user leaves your team or no longer needs access, "Admins" users can "Remove" them from the organization to restrict data access or clean up some resources that are no longer used. This is done from the context menu in the ["Members" table](https://app.motherduck.com/settings/members). :::warning Because a user can only belong to one organization, removing them from the organization permanently deletes the user and all of their data. This action cannot be undone. ::: ## Limitations and Upcoming Improvements Currently Organizations have the following limitations: - It is not possible to explore existing Organizations. Please reach out to other MotherDuck users at your company or [contact us](../../troubleshooting/support.md) if you would like to find other users at your company. --- --- sidebar_position: 8 title: Running dual (or hybrid) queries --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck can use local data and remote data in the same query. **Example:** Run the DuckDB CLI. ```bash duckdb ``` Connect to MotherDuck. You may be prompted to sign in if you aren't already. ```sql ATTACH 'md:'; ``` Run the following in a MotherDuck notebook. Create a local database in memory. ```sql ATTACH ':memory:' AS local_db; CREATE TABLE local_db.pricing AS FROM (VALUES ('A', 1.4), ('B', 1.12), ('C', 2.552), ('D', 5.23)) pricing(item, price); FROM local_db.pricing; ``` ```bash ┌─────────┬──────────────┐ │ item │ price │ │ varchar │ decimal(4,3) │ ├─────────┼──────────────┤ │ A │ 1.400 │ │ B │ 1.120 │ │ C │ 2.552 │ │ D │ 5.230 │ └─────────┴──────────────┘ ``` Create a remote database in MotherDuck. ```sql CREATE OR REPLACE DATABASE remote_db; CREATE TABLE remote_db.sales AS SELECT 'ABCD'[floor(random() * 3.999)::int + 1] AS item, current_date() - interval (random() * 100) days AS dt, floor(random() * 50)::int AS tally FROM generate_series(1000); FROM remote_db.sales LIMIT 10; ``` ```bash ┌─────────┬─────────────────────┬───────┐ │ item │ dt │ tally │ │ varchar │ timestamp │ int32 │ ├─────────┼─────────────────────┼───────┤ │ D │ 2024-11-29 00:00:00 │ 0 │ │ A │ 2024-10-04 00:00:00 │ 17 │ │ A │ 2024-10-13 00:00:00 │ 0 │ │ C │ 2024-11-05 00:00:00 │ 49 │ │ A │ 2024-09-30 00:00:00 │ 12 │ │ B │ 2024-09-27 00:00:00 │ 47 │ │ C │ 2024-11-23 00:00:00 │ 47 │ │ B │ 2024-09-18 00:00:00 │ 13 │ │ A │ 2024-11-18 00:00:00 │ 40 │ │ C │ 2024-09-18 00:00:00 │ 4 │ ├─────────┴─────────────────────┴───────┤ │ 10 rows 3 columns │ └───────────────────────────────────────┘ ``` Join the remote sales table to our local pricing data to get revenue by month. ```sql SELECT date_trunc('month', dt) AS mo, round(sum(price * tally),2) AS rev FROM remote_db.sales JOIN (FROM local_db.pricing WHERE price > 2) pricing ON sales.item = pricing.item GROUP BY mo ORDER BY mo; ``` ```bash ┌────────────┬───────────────┐ │ mo │ rev │ │ date │ decimal(38,2) │ ├────────────┼───────────────┤ │ 2024-09-01 │ 9241.39 │ │ 2024-10-01 │ 14226.12 │ │ 2024-11-01 │ 13136.55 │ │ 2024-12-01 │ 7783.26 │ └────────────┴───────────────┘ ``` To see what is running locally and remotely, you can use EXPLAIN: ```sql EXPLAIN SELECT date_trunc('month', dt) AS mo, round(sum(price * tally),2) AS rev FROM remote_db.sales JOIN (FROM local_db.pricing WHERE price > 2) pricing ON sales.item = pricing.item GROUP BY mo ORDER BY mo; ``` In each operator of the plan, `(L)` indicates local while `(R)` indicates remote. Data is transferred using sinks and sources. ```bash ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ DOWNLOAD_SOURCE (L) │ │ ──────────────────── │ │ bridge_id: 1 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ BATCH_DOWNLOAD_SINK (R) │ │ ──────────────────── │ │ bridge_id: 1 │ │ parallel: true │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ ORDER_BY (R) │ │ ──────────────────── │ │ date_trunc('month', sales │ │ .dt) ASC │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ──────────────────── │ │ 0 │ │ rev │ │ │ │ ~125 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ HASH_GROUP_BY (R) │ │ ──────────────────── │ │ Groups: #0 │ │ Aggregates: sum(#1) │ │ │ │ ~125 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ──────────────────── │ │ mo │ │ (CAST(price AS DECIMAL(14 │ │ ,3)) * CAST(tally AS │ │ DECIMAL(14,0))) │ │ │ │ ~250 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ──────────────────── │ │ #0 │ │ #1 │ │ #2 │ │__internal_compress_string_│ │ utinyint(#3) │ │ #4 │ │ │ │ ~250 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ HASH_JOIN (R) │ │ ──────────────────── │ │ Join Type: INNER │ │ │ │ Conditions: ├──────────────┐ │ item = item │ │ │ │ │ │ ~250 Rows │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN (R) ││ UPLOAD_SOURCE (R) │ │ ──────────────────── ││ ──────────────────── │ │ sales ││ bridge_id: 2 │ │ ││ │ │ Projections: ││ │ │ item ││ │ │ dt ││ │ │ tally ││ │ │ ││ │ │ ~1001 Rows ││ │ └───────────────────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ BATCH_UPLOAD_SINK (L) │ │ ──────────────────── │ │ bridge_id: 2 │ │ parallel: true │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (L) │ │ ──────────────────── │ │ item │ │ price │ │ │ │ ~1 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ SEQ_SCAN (L) │ │ ──────────────────── │ │ pricing │ │ │ │ Projections: │ │ price │ │ item │ │ │ │ Filters: │ │ price>2.000 AND price IS │ │ NOT NULL │ │ │ │ ~1 Rows │ └───────────────────────────┘ ``` A dual (or hybrid) query can be run on any database format supported by DuckDB, including [sqlite](https://duckdb.org/docs/extensions/sqlite_scanner), [postgres](https://duckdb.org/docs/extensions/postgres_scanner) and many others. --- --- sidebar_position: 4 title: Managing shares --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Getting details about a share You can learn more about a specific share that you’ve created by using [`DESCRIBE SHARE`](/sql-reference/motherduck-sql-reference/describe-share.md) command. For example: ```sql DESCRIBE SHARE "duckshare"; ``` In the UI you can roll over a share to see a tooltip that tells you the share owner, when it was last updated, and access scope. ## Listing Shares You can list the shares you have created via the [`LIST SHARES`](/sql-reference/motherduck-sql-reference/list-shares.md) statement. For example: ```sql LIST SHARES; ``` 1. You can see shares that you've created under "Shares I've created". 2. You can find **Discoverable** **Organization** shares that members of your Organization created under "Shared with me". To view the URLs of shares created by others that you have currently attached, use the [`SHOW ALL DATABASES`](/sql-reference/motherduck-sql-reference/show-databases/) command. The `fully_qualified_name` column gives you the share URL of the attached share. ## Deleting a share Shares can be deleted with the [`DROP SHARE`](/sql-reference/motherduck-sql-reference/drop-share.md) or `DROP SHARE IF EXISTS` method. For example: Users who have [`ATTACH`](/sql-reference/motherduck-sql-reference/attach-share.md)-ed it will lose access. ```sql DROP SHARE "share1"; ``` 1. Roll over the share you'd like to delete. 2. Click on the "trident" on the right side. 3. Select "Drop". 4. Confirm. ## Updating a share Sharing a database creates a point-in-time snapshot of the database at the time it is shared. To publish changes, you need to explicitly run `UPDATE SHARE `. When updating a `SHARE` with the same database, the URL does not change. ```sql UPDATE SHARE ; ``` In the following example database ‘mydb’ was previously shared by creating a share ‘myshare’, and the database ‘mydb’ has been updated since. Owner of the database would like their colleagues to receive the new version of this database: ```sql # 'myshare' was previously created on the database 'mydb' UPDATE SHARE "myshare"; ``` If you lost your database share url, you can use the `LIST SHARES` command to list all your share or `DESCRIBE SHARE ` to get specific details about a given share name. ## Editing/Altering a share You can change the configuration of shares you've created in the UI. SQL operation `ALTER SHARE` is in the works. 1. Roll over the share you'd like to edit. 2. Click on the "trident" on the right side. 3. Select "Alter". 4. Change the share configuration as you see fit. 5. Confirm "Alter share". --- --- title: Sharing data in MotherDuck description: Learn how to securely share data in MotherDuck --- You can easily and securely share data in MotherDuck. MotherDuck's sharing model is specifically optimized for the following scenarios: - Sharing data with everyone in your Organization for easy discovery and low-friction access. Typical of small highly collaborative data teams. - Sharing data with specific accounts within your Organization. Popular with data application builders needing to isolate tenants. - Sharing data publicly, with all MotherDuck account inside or outside of your Organization. import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 1 title: Sharing concepts and overview --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Sharing data in MotherDuck MotherDuck's data sharing model currently has the following key characteristics: - Sharing is at the granularity of a MotherDuck database. - Sharing is at the moment only read-only. - Sharing is done through **share** objects. - You can make shares easily discoverable and queryable by all users in your [Organization](../managing-organizations/managing-organizations.mdx). - Alternatively, you can use share URLs to limit whom you share the data with. Sharing in MotherDuck works as follows: 1. The **data provider** shares their database in MotherDuck by creating a share. 2. The **data consumer** attaches said share, which creates a clone database in their workspace. The data consumer can now query this database. 3. The **data provider** periodically updates the share to push updates to the database to **data consumers**. ## Creating a share The first step in sharing databases in MotherDuck is to create a share, which can be done in both UI and SQL. Creating a share does not incur additional costs, and no actual data is copied or transferred - creating a share is a zero-copy, metadata-only operation. Click on the "trident" next to the database you'd like to share. Select "share". Then: ![trident](./img/ui-share_new.png) 1. Optionally, choose a share name. Default will be the database name. 2. Choose whether the share should only be accessible by users in your Organization, or anyone with the share link. 3. Choose whether the share should be automatically updated or not. Default is `MANUAL` The following example creates a share from database "birds": - Share is also named "birds". - This share can only be accessed by accounts authenticated in your [Organization](../managing-organizations/managing-organizations.mdx). - This share is discoverable. Users in your Organization will be able to easily find this share. ```sql use birds; CREATE SHARE; -- Shorthand syntax. Share name is optional. By default, shares are Organization-scoped and Discoverable. CREATE SHARE IF NOT EXISTS birds FROM birds (ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE, UPDATE MANUAL); -- This query is identical to the previous one but with explicit options. ``` Learn more about the [CREATE SHARE](/sql-reference/motherduck-sql-reference/create-share.md) SQL command. ### Organization shares When creating a share, you may choose scope of access to this share: - **Organization**. Only users authenticated in your Organization will have access to this share. - **Unrestricted**. Any user signed into MotherDuck can access this share using the share URL. ### Discoverable shares When creating a share, you may choose to make this share **Discoverable**. All authenticated users in your Organization will be able to easily find this share in the UI. You can create **Discoverable** shares that are **Unrestricted**, but only members of your Organization can find this share in the UI. Non-members can still access this share using the share URL. ### Share URLs When you create a share, a URL for this share is generated: - If the share is **Discoverable**, members of your Organization will easily be able to find this share without the share URL. Alternatively, they can use the URL directly. - If the share is **Hidden** (e.g. not Discoverable), other users will not be able to find the share URL. You will need to send this URL directly to the users with whom you want to share this data. ## Consuming shared data The **data consumer** needs to attach the share to her workspace, thereby creating a read-only zero-copy clone of the source database. This is a free metadata-only operation. ### Consuming discoverable shares If the **data provider** created a Discoverable share, you should be able to find this share in the UI. 1. Select the share you want under "Shared with me". 2. Optionally roll over the share to see the tooltip that tells you the share owner, when it was last updated, and share access scope. 2. Click "attach". 3. You can now query the resulting database. :::note The ability to list and discover Discoverable shares in SQL is coming shortly. ::: ### Consuming hidden shares If the **data provider** created a Hidden (e.g. non-Discoverable) share, they need to pass the **data consumer** the share URL. The **data consumer**, in turn, needs to attach the share URL. ```sql ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81' AS birds; # attaches the share as database `birds` ``` ## Updating shared data If during creation of the share, the **data provider** chose to have the share updated automatically, the share will be updated periodically. If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share. ```sql UPDATE SHARE birds; ``` Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md). ## Consuming updated data By default, shares automatically update every minute. However, if you need the most up-to-date data sooner, the consumer can manually refresh the share after the producer executes UPDATE SHARE. To manually refresh the data: ```sql REFRESH DATABASES; -- Refreshes all connected databases and shares REFRESH DATABASE my_share; -- Alternatively, refresh a specific database/share ``` Lean more about [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md). --- --- sidebar_position: 3 title: Sharing data with specific users --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck enables you to securely share data with specific users. Common scenarios include: - Building data applications, in which each tenant should only have access to their own data. - Sharing sensitive data within your Organization. - Sharing data outside of your Organization. Sharing data with others is easy: 1. **Data provider** creates a **Hidden** share. 2. **Data provider** gets back the share URL and passes this URL to the **data consumer**. 2. **Data consumer** **attaches** the share. 3. **Data provider** periodically updates the share to push new data to **data consumers**. ## 1. Creating hidden shares To share a database, first create a Hidden share. No actual data is copied and no additional costs are incurred in this process. Click on the "trident" next to the database you'd like to share. Select "share". ![trident](./img/ui-share_new.png) 1. Optionally name the share. 2. Under Access choose "Specified users with the share link" if you'd like to limit access to a select set of members within your Organization. You can search for and add the specific users within your Organization that should have access. For these users the share will appear in the UI under 'Shared with me' and can be attached. Anyone within the organization that is not included in the list will not be able to access the share even if they have a share link. If you need to share the data with MotherDuck users outside of your Organization, instead choose the "Anyone with the share link" option. This will enable anyone that has the share link to attach and query the share. 3. Create the share. 4. Copy the resulting **ATTACH** command to your clipboard and send it to your **data consumers**. ![trident](./img/ui-share3.png) ```sql use birds; CREATE SHARE birds FROM birds (ACCESS UNRESTRICTED , VISIBILITY HIDDEN); -- This query creates a Hidden share accessible by anyone with the share link > md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa ``` Save the returned share URL and pass it to **data consumers**. ## 2. Consuming shares The **data consumer** in your Organization can use SQL to attach the share and start querying it! Run the `ATTACH` command to attach the share as a queryable database. This is a zero-cost metadata-only operation. ```sql ATTACH md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa; -- Creates a zero-copy clone database called birds ``` Learn more about [ATTACH SHARE](/sql-reference/motherduck-sql-reference/attach-share.md). ## 3. Updating shared data If the database being shared has changed, in order for the changes to propagate to the **data consumer**, the **data provider** needs to update the share. ```sql UPDATE SHARE birds; ``` Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md). :::note We are working on auto-updating shares. ::: ## 4. Modifying share access If you need to change who has access to the share, find the target share in the "Shares I've created" section of the Object Explorer and choose the 'Alter' option from the context menu. From here you can change the overall `ACCESS` type or add and remove explicit user permissions if the share is set to have `ACCESS RESTRICTED` ('Specified users with the share link') --- --- sidebar_position: 2 title: Sharing data with your organization --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Sharing data with your organization MotherDuck makes it easy for you to share data with all members of your Organization and making that data easily discoverable and queryable. This is a common use case for small, highly collaborative data teams. 1. **Data provider** creates an **Organization** scoped, **Discoverable** share. 2. **Data consumers** easily find the share and **attach** it. 3. **Data provider** periodically updates the share to push new data to **data consumers**. ## 1. Creating organization-scoped, discoverable shares To share a database with your Organization, create a share. No actual data is copied and no additional costs are incurred in this process. ![trident](./img/ui-share_new.png) Click on the "trident" next to the database you'd like to share. Select "share". Then: 1. Optionally, choose a share name. Default will be the database name. 2. Choose whether the share should only be accessible by users in your Organization, or anyone with the share link. 3. Choose whether the share should be automatically updated or not. Default is `MANUAL` ```sql use birds; CREATE SHARE; -- Shorthand syntax. Share name is optional. By default, shares are Organization-scoped and Discoverable. CREATE SHARE birds FROM birds (ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE); -- This query is identical to the previous one yet optionally more verbose. ``` ## 2. Finding and consuming shares The **data consumer** in your Organization can use the UI to find the share, attach it, and start querying it! 1. Select the share you want under "Shared with me" 2. Click "attach" and optionally name the resulting database. 3. You can now query the resulting database. :::note The ability to list and discover Discoverable shares in SQL is coming shortly. ::: ## 3. Updating shared data If the database being shared has changed, in order for the changes to propagate to the **data consumer**, the **data provider** needs to update the share. ```sql UPDATE SHARE birds; ``` Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md). :::note We are working on auto-updating shares. ::: --- --- sidebar_position: 9 title: Using LLMs with MotherDuck --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## MCP Server ### What is MCP? The Model Context Protocol (MCP) is an open standard that enables AI assistants to interact with external data sources and tools. Think of MCP like a USB-C port for AI applications - it provides a standardized way to connect AI models to different data sources and tools. Learn more at [modelcontextprotocol.io](https://modelcontextprotocol.io/introduction). ### Purpose MotherDuck's DuckDB MCP Server implements this protocol to allow AI assistants like [Claude](https://claude.ai/), or AI IDEs like [Cursor](https://www.cursor.com/) to directly interact with your local DuckDB or MotherDuck cloud databases. It enables conversational SQL analytics without complex setup, letting you analyze your data through natural language conversations. ### Key Features - Query data from local DuckDB and/or MotherDuck cloud databases - Access data in cloud storage (AWS s3) through MotherDuck's integrations - Execute SQL analytics using natural language requests ### Getting Started :::note While the MCP can connect to MotherDuck, you can also use it without any connection to the Cloud for pure DuckDB actions. Find out more about connecting to [local DuckDB here](https://github.com/motherduckdb/mcp-server-motherduck?tab=readme-ov-file#connect-to-local-duckdb). ::: To use the MCP server with MotherDuck, you'll need: - A MotherDuck account and access token - Claude Desktop, Cursor, VS Code or another MCP-compatible client Setup guides: - [Cursor Integration](https://github.com/motherduckdb/mcp-server-motherduck?tab=readme-ov-file#running-in-sse-mode) - [VS Code Integration](https://github.com/motherduckdb/mcp-server-motherduck?tab=readme-ov-file#usage-with-vs-code) - [Claude Desktop Integration](https://github.com/motherduckdb/mcp-server-motherduck?tab=readme-ov-file#usage-with-claude-desktop) If the MCP server is exposed to third parties and should only have read access to data, we recommend using a [read scaling token](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#creating-a-read-scaling-token) and running the MCP server in [SaaS mode](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-saas-mode). - [Secure Configuration](https://github.com/motherduckdb/mcp-server-motherduck?tab=readme-ov-file#securing-your-mcp-server-when-querying-motherduck) For detailed information, view our [MCP Server repository](https://github.com/motherduckdb/mcp-server-motherduck). ## llms.txt You can access the DuckDB and MotherDuck documentation in Markdown format at motherduck.com/docs/llms.txt and duckdb.org/docs/stable/llms.txt. These files are designed to help Large Language Models (LLMs) answer questions about DuckDB and MotherDuck based on the official documentation. ### Purpose The [`llms.txt`](https://motherduck.com/docs/llms.txt) file follows the emerging [llmstxt.org](https://llmstxt.org) standard for organizing documentation in a format optimized for AI assistants. It helps tools like ChatGPT, LangChain agents, and other LLMs: - Discover relevant information about MotherDuck’s features and capabilities - Understand and explain MotherDuck’s SQL dialect (including DuckDB and MotherDuck-specific syntax) - Assist with integration and setup questions - Troubleshoot common issues more effectively By pointing AI tools to our `llms.txt`, you make it easier for them to provide accurate, up-to-date answers based on our official documentation. ### Available Files #### MotherDuck - [`llms.txt`](https://motherduck.com/docs/llms.txt): Contains key information about MotherDuck’s features and capabilities. - [`llms-full.txt`](https://motherduck.com/docs/llms-full.txt): Comprehensive documentation covering all MotherDuck pages. #### DuckDB - [`llms.txt`](https://duckdb.org/docs/stable/llms.txt): Focused on DuckDB’s SQL dialect and features. - [`llms-full.txt`](https://duckdb.org/docs/stable/llms-full.txt): Full documentation for DuckDB. #### Difference between `llms.txt` and `llms-full.txt` While the `llms.txt` is an index with hyperlinks to documentation pages, the `llms-full.txt` file provides the complete text of all documentation pages for either MotherDuck or DuckDB. It is designed for scenarios where the AI tool does not have the ability to navigate hyperlinks, and offers a way to make the full documentation accessible at once. It requires a model with large context size (>128k tokens). #### Usage Tips - For MotherDuck-specific features and capabilities, use the MotherDuck `llms.txt` or `llms-full.txt`. - For DuckDB SQL dialect-specific questions, refer to the DuckDB `llms.txt` or `llms-full.txt`. - If unsure, add both files to the context to ensure comprehensive coverage of your question. ### Example Usage To prompt an LLM with questions using the `llms.txt` or `llms-full.txt` files: 1. Copy the content from an [available llms.txt or llms-full.txt](#available-files) file. 2. Use the following prompt format: ``` Documentation: {paste documentation here} --- Based on the above documentation, answer the following: {your question about DuckDB or MotherDuck} ``` When using Cursor, the `llms-full.txt` file can be added directly to the chat context with `@ (Weblink)`, e.g. `@motherduck.com/docs/llms-full.txt`. --- --- sidebar_position: 8 title: Write SQL with AI --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Access SQL Assistant functions MotherDuck provides built-in AI features to help you write, understand and fix DuckDB SQL queries more efficiently. These features include: - [Answer questions about your data](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query) using the `prompt_query` pragma. - [Generate SQL](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql) for you using the `prompt_sql` table function. - [Correct and fix up your SQL query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup) using the `prompt_fixup` table function. - [Correct and fix up your SQL query line-by-line](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line) using the `prompt_fix_line` table function. - [Help you understand a query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain) using the `prompt_explain` table function. - [Help you understand contents of a database](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema) using the `prompt_schema` table function. ### Example usage of prompt_sql We use MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). ```sql CALL prompt_sql('what are the top domains being shared on hacker_news?'); ``` Output of this SQL statement is a single column table that contains the AI-generated SQL query. | **query** | |-----------------| | SELECT COUNT(*) as domain_count, SUBSTRING(SPLIT_PART(url, '//', 2), 1, POSITION('/' IN SPLIT_PART(url, '//', 2)) - 1) as domain FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10 | ## Automatically Fix SQL Errors in the WebUI FixIt is a MotherDuck AI-powered UI feature that helps you resolve common SQL errors by offering fixes in-line. Read more about it in our [blog post](https://motherduck.com/blog/introducing-fixit-ai-sql-error-fixer/). FixIt can also be called programmatically using the `prompt_fix_line` . Find more information [here](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line). ### How FixIt works By default, FixIt is enabled for all users. If you run a query that has an error, FixIt will automatically analyze the query and suggest in-line fixes. ![FixIt](./img/fixit_working.png) ![FixIt](./img/fixit_highlights.png) You can choose to accept, to reject, or to ignore a fix. If you accept a fix, MotherDuck will automatically update your query and re-execute it. You can also generate a different suggestion if the proposed fix is inaccurate. ### How to disable FixIt You can disable FixIt by clicking on the FixIt icon in the top right corner of the query editor. ---

{props.thing} in MotherDuck {props.verb} differences from DuckDB. When referencing information about {props.thing} in DuckDB Documentation at {props.ddburl}, consider the differences listed in this topic.

---

{props.thing} in MotherDuck {props.verb} no different than in DuckDB. For more information, see {props.ddburl} in DuckDB Documentation.

--- --- sidebar_position: 6 title: Aggregate functions --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Aggregate Functions} /> --- --- sidebar_position: 8 title: Configurations --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Configuration} /> --- --- sidebar_position: 9 title: Constraints --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Constraints} /> --- --- sidebar_position: 3 title: Data types --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Data Types} /> --- --- title: DuckDB SQL description: DuckDB SQL Reference --- --- --- title: ALTER TABLE --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; ALTER TABLE} /> --- --- title: ATTACH/DETACH --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; ATTACH/DETACH} /> --- --- title: CALL --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; CALL} /> --- --- title: COPY --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; COPY} /> --- --- title: CREATE INDEX --- # CREATE INDEX The `CREATE INDEX` statement in MotherDuck has differences from DuckDB. While the syntax is supported, indexes are not currently utilized for query acceleration in MotherDuck. This is generally not a concern as MotherDuck is already highly optimized for analytical workloads and provides excellent query performance through optimized data storage and processing. ## Key Differences - Indexes can be created but do not provide performance benefits - Queries that would use an index scan in DuckDB will use a sequential scan in MotherDuck instead ## Example ```sql -- Create a table and an index CREATE TABLE users(id INTEGER, name VARCHAR); INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'); CREATE INDEX idx_user_id ON users(id); -- This query will use a sequential scan in MotherDuck -- even though an index scan would be used in DuckDB SELECT * FROM users WHERE id = 1; ``` You can verify this behavior using the EXPLAIN statement: ```sql EXPLAIN SELECT * FROM users WHERE id = 100; -- Will show SEQ_SCAN in MotherDuck -- Would show INDEX_SCAN in DuckDB ``` :::note While queries that would benefit from index acceleration in DuckDB will use different execution plans in MotherDuck, MotherDuck's architecture is designed to provide fast analytical query performance even without indexes. The platform uses various optimizations and a cloud-native architecture to ensure efficient query execution. ::: Additionally, it's worth noting that indexes can significantly slow down `INSERT` operations, as the index needs to be updated with each new record. Since indexes don't provide query acceleration benefits in MotherDuck, creating them will only add this overhead without any corresponding advantages. For reference, you can learn more about how indexes work in DuckDB in their [Indexes documentation](https://duckdb.org/docs/sql/indexes). --- --- title: CREATE MACRO --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; CREATE MACRO} /> --- --- title: CREATE TABLE --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; CREATE TABLE} /> --- --- title: DELETE --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; DELETE} /> --- --- title: DROP --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; DROP} /> --- --- title: DuckDB statements description: DuckDB statements --- --- --- title: EXPORT --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; Export & Import Database} /> --- --- title: INSERT --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; INSERT Statement} /> --- --- title: PIVOT --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; PIVOT Statement} /> --- --- title: SELECT --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; SELECT Statement} /> --- --- title: SET/RESET --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; SET/RESET} /> --- --- title: UNPIVOT --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; UNPIVOT Statement} /> --- --- title: UPDATE --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; UPDATE Statement} /> --- --- title: USE --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; USE} /> --- --- title: VACUUM --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; VACUUM} /> --- --- sidebar_position: 3 title: Enum data type --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; enum data type} /> --- --- sidebar_position: 3 title: Expressions --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Expressions} /> --- --- sidebar_position: 5 title: Functions --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Functions} /> --- --- sidebar_position: 10 title: Information schema --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Information Schema} /> If you want to query information about your MotherDuck entities, take a look at [md_information_schema](/sql-reference/motherduck-sql-reference/md_information_schema/introduction). --- --- sidebar_position: 11 title: Metadata functions --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; DuckDB_% Metadata Functions} /> --- --- sidebar_position: 12 title: PRAGMA statements --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Pragmas} /> --- --- sidebar_position: 2 title: Query syntax --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; SELECT Clause} /> --- --- sidebar_position: 13 title: SAMPLE --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Samples} /> --- --- sidebar_position: 7 title: Window functions --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; Window Functions} /> --- --- sidebar_position: 1 title: EMBEDDING --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. ## Embedding Function The `embedding` function allows you to generate vector representations (embeddings) of text directly from SQL. These embeddings capture semantic meaning, enabling powerful similarity search and other natural language processing tasks. The function uses OpenAI's models: `text-embedding-3-small` (default) with 512 dimensions or `text-embedding-3-large` with 1024 dimensions. Both models support single- and multi-row inputs, enabling batch processing. The maximum input size is limited to 2048 characters - larger inputs will be truncated. Consumption is measured in [AI Units](/about-motherduck/billing/pricing#ai-function-pricing). One AI Unit equates to approximately: - 60,000 embedding rows with `text-embedding-3-small` - 12,000 embedding rows with `text-embedding-3-large` These estimates assume an input size of 1,000 characters. ### Syntax ```sql SELECT embedding(my_text_column) FROM my_table; -- returns FLOAT[512] column ``` ### Parameters The `embedding` function accepts parameters using named parameter syntax with the `:=` operator. | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `text_input` | Yes | The text to be converted into an embedding vector | | `model` | No | Model type, either `'text-embedding-3-small'` (default) or `'text-embedding-3-large'` | ### Return Types The `embedding` function returns different array sizes depending on the model used: - With `text-embedding-3-small`: Returns `FLOAT[512]` - With `text-embedding-3-large`: Returns `FLOAT[1024]` ### Examples #### Basic Embedding Generation ```sql -- Generate embeddings using the default model (text-embedding-3-small) SELECT embedding('This is a sample text') AS text_embedding; -- Generate embeddings using the larger model for higher dimensionality SELECT embedding('This is a sample text', model:='text-embedding-3-large') AS text_embedding; ``` #### Batch Processing ```sql -- Generate embeddings for multiple rows at once SELECT title, embedding(overview) AS overview_embeddings FROM kaggle.movies LIMIT 10; ``` ### Use Cases #### Creating an Embedding Database This example uses the sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). ```sql --- Create a new table with embeddings for the first 100 overview entries CREATE TABLE my_db.movies AS SELECT title, overview, embedding(overview) AS overview_embeddings FROM kaggle.movies LIMIT 100; ``` If write access to the source table is available, the embedding column can also be added in place: ```sql --- Update the existing table to add new column for embeddings ALTER TABLE my_db.movies ADD COLUMN overview_embeddings FLOAT[512]; --- Populate the column with embeddings UPDATE my_db.movies SET overview_embeddings = embedding(overview); ``` The movies table now contains a new column `overview_embeddings` with vector representations of each movie description: ```sql SELECT * FROM my_db.movies; ``` | **title** | **overview** | **overview_embeddings** | | ----------------- | ----------------- |----------------------------------------------------| | 'Toy Story 3' | 'Led by Woody, Andy's toys live happily in [...]' | [0.023089351132512093, -0.012809964828193188, ...] | | 'Jumanji' | 'When siblings Judy and Peter discover an [...]' | [-0.005538413766771555, 0.0799209326505661, ...] | | ... | ... | ... | #### Semantic Similarity Search The `array_cosine_similarity` function can be used to compute similarities between embeddings. This enables semantic search to retrieve entries that are conceptually / semantically similar to a query, even if they don't share the same keywords. ```sql -- Find movies similar to "Toy Story" based on semantic similarity SELECT title, overview, array_cosine_similarity( embedding('Led by Woody, Andy''s toys live happily [...]'), overview_embeddings ) AS similarity FROM kaggle.movies WHERE title != 'Toy Story' ORDER BY similarity DESC LIMIT 5; ``` | **title** | **overview** | **similarity** | |-----------------|-----------------|-----------------| |'Toy Story 3'|'Woody, Buzz, and the rest of Andy's toys haven't [...]'|0.7372807860374451| |'Toy Story 2'|'Andy heads off to Cowboy Camp, leaving his toys [...]'|0.7222828269004822| |... |... |... | #### Building a Recommendation System Embeddings can be used to build content-based recommendation systems: ```sql -- Create a macro to recommend similar movies CREATE OR REPLACE MACRO recommend_similar_movies(movie_title) AS TABLE ( WITH target_embedding AS ( SELECT embedding(overview) AS emb FROM sample_data.kaggle.movies WHERE title = movie_title LIMIT 1 ) SELECT m.title AS recommended_title, m.overview, array_cosine_similarity(t.emb, m.overview_embeddings) AS similarity FROM sample_data.kaggle.movies m, target_embedding t WHERE m.title != movie_title ORDER BY similarity DESC LIMIT 5 ); -- Use the macro to get recommendations SELECT * FROM recommend_similar_movies('The Matrix'); ``` #### Retrieval-Augmented Generation (RAG) Embeddings are a key component in building [RAG](https://motherduck.com/blog/search-using-duckdb-part-2/) systems, which can be combined with the [[`prompt` function]](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/prompt/#retrieval-augmented-generation-rag) for powerful question-answering capabilities: ```sql -- Create a reusable macro for question answering CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE ( SELECT question_text AS question, prompt( 'User asks the following question:\n' || question_text || '\n\n' || 'Here is some additional information:\n' || STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' || 'Please answer the question based only on the additional information provided.', model := 'gpt-4o' ) AS response FROM ( SELECT title, overview FROM sample_data.kaggle.movies ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC LIMIT 3 ) ); -- Use the macro to answer questions SELECT question, response FROM ask_question('Can you recommend some good sci-fi movies about AI?'); ``` ### Security Considerations When passing free-text arguments from external sources to the embedding function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection. ```python # Using prepared statements in Python user_query = "Led by Woody, Andy's toys live happily [...]" con.execute(""" SELECT title, overview, array_cosine_similarity(embedding(?), overview_embeddings) as similarity FROM kaggle.movies ORDER BY similarity DESC LIMIT 5""", [user_query]) ``` ### Error Handling When usage limits have been reached or an unexpected error occurs while computing embeddings, the function will not fail the entire query but will return `NULL` values for the affected rows. To check if all embeddings were computed successfully: ```sql -- Check for NULL values in embedding column SELECT count(*) FROM my_db.movies WHERE overview_embeddings IS NULL AND overview IS NOT NULL; ``` Missing values can be filled in with a separate query: ```sql -- Fill in missing embedding values UPDATE my_db.movies SET overview_embeddings = embedding(overview) WHERE overview_embeddings IS NULL AND overview IS NOT NULL; ``` ### Performance Considerations - **Batch Processing**: when processing multiple rows, consider using `LIMIT` to control the number of API calls. - **Model Selection**: use `text-embedding-3-small` for faster, less expensive embeddings when the highest precision isn't critical. - **Caching**: results are not cached between queries, so consider storing embeddings in tables for repeated use. - **Dimensionality**: higher dimensions (using `text-embedding-3-large`) provide more precise semantic representation but require more storage and computation time. ### Notes These capabilities are provided by MotherDuck's integration with OpenAI and inputs to the embedding function will be processed by OpenAI. For availability and usage limits, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume embedding workloads and is always open to working alongside customers to support any type of workload and model requirements. If higher usage limits are needed, please reach out directly to the [Slack support channel](https://slack.motherduck.com/) or email support@motherduck.com, we're always happy to help! --- --- sidebar_position: 1 title: PROMPT --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. ## Prompt Function The `prompt` function allows you to interact with Large Language Models (LLMs) directly from SQL. You can generate both free-form text and structured data outputs. The function uses OpenAI's models: `gpt-4o-mini` (default) or `gpt-4o`. Both models support single- and multi-row inputs, enabling batch processing. Consumption is measured in [AI Units](/about-motherduck/billing/pricing#ai-function-pricing). One AI Unit equates to approximately: - 4,000 prompt responses with `gpt-4o-mini` - 250 prompt responses with `gpt-4o` These estimates assume an input size of 1,000 characters and response size of 250 characters. ### Syntax ```sql SELECT prompt('Write a poem about ducks'); -- returns a single cell table with the response ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `prompt_text` | Yes | The text input to send to the model | | `model` | No | Model type, either `'gpt-4o-mini'` (default), or `'gpt-4o-2024-08-06'` (alias: `'gpt-4o'`) | | `temperature` | No | Model temperature value between `0` and `1`, default: `0.1`. Lower values produce more deterministic outputs. | | `struct` | No | Output schema as struct, e.g. `{summary: 'VARCHAR', persons: 'VARCHAR[]'}`. Will result in `STRUCT` output. | | `struct_descr` | No | Descriptions for struct fields that will be added to the model's context, e.g. `{summary: 'a 1 sentence summary of the text', persons: 'an array of all persons mentioned in the text'}` | | `json_schema` | No | A JSON schema that adheres to [OpenAI's structured output guide](https://platform.openai.com/docs/guides/structured-outputs/supported-schemas). Provides more flexibility than the struct/struct_descr parameters. Will result in `JSON` output. | ### Return Types The `prompt` function can return different data types depending on the parameters used: - Without structure parameters: Returns `VARCHAR` - With `struct` parameter: Returns a `STRUCT` with the specified schema - With `json_schema` parameter: Returns `JSON` ### Examples #### Basic Text Generation ```sql -- Call gpt-4o-mini (default) to generate text SELECT prompt('Write a poem about ducks') AS response; -- Call gpt-4o with higher temperature for more creative outputs SELECT prompt('Write a poem about ducks', model:='gpt-4o', temperature:=1) AS response; ``` #### Structured Output with Struct ```sql -- Extract structured information from text using struct parameter SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'}, struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review; ``` This returns a `STRUCT` value that can be accessed with dot notation: ```sql SELECT zoo_review.summary, zoo_review.favourite_animals, zoo_review.star_rating FROM ( SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'}, struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review ); ``` #### Structured Output with JSON Schema ```sql -- Extract structured information using JSON schema SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', json_schema := '{ "name": "zoo_visit_review", "schema": { "type": "object", "properties": { "summary": { "type": "string" }, "sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] }, "animals_seen": { "type": "array", "items": { "type": "string" } } }, "required": ["summary", "sentiment", "animals_seen"], "additionalProperties": false }, "strict": true }') AS json_review; ``` This returns a `JSON` value that, if saved, can be accessed using JSON extraction functions: ```sql SELECT json_extract_string(json_review, '$.summary') AS summary, json_extract_string(json_review, '$.sentiment') AS sentiment, json_extract(json_review, '$.animals_seen') AS animals_seen FROM ( SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', json_schema := '{ ... }') AS json_review ); ``` ### Use Cases #### Text Generation Using the prompt function to write a poem about ducks: ```sql --- Prompt LLM to write a poem about ducks SELECT prompt('Write a poem about ducks') AS response; ``` | **response** | |------------------------------------------------------------------------------------------------------------------| | 'Beneath the whispering willow trees, Where ripples dance with wayward breeze, A symphony of quacks arise [...]' | ### Summarization We use the prompt function to create a one-sentence summary of movie descriptions. The example is based on the sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). ```sql --- Create a new table with summaries for the first 100 overview texts CREATE TABLE my_db.movies AS SELECT title, overview, prompt('Summarize this movie description in one sentence: ' || overview) AS summary FROM kaggle.movies LIMIT 100; ``` If write access to the source table is available, the summary column can also be added in place: ```sql --- Update the existing table to add new column for summaries ALTER TABLE my_db.movies ADD COLUMN summary VARCHAR; --- Populate the column with summaries UPDATE my_db.movies SET summary = prompt('Summarize this movie description in one sentence: ' || overview); ``` The movies table now contains a new column `summary` with one-sentence summaries of the movies: ```sql SELECT title, overview, summary FROM my_db.movies; ``` | **title** | **overview** | **summary** | |-----------|----------------------------------------------|------------------------------------------------------| | Toy Story | Led by Woody, Andy's toys live happily [...] | In "Toy Story," Woody's jealousy of the new [...] | | Jumanji | When siblings Judy and Peter discover [...] | In this thrilling adventure, siblings Judy and [...] | | ... | ... | ... | ### Structured Data Extraction The prompt function can be used to extract structured data from text. The example is based on the same sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). This time we aim to extract structured metadata from the movie's overview description. We are interested in the main characters mentioned in the descriptions, as well as the movie's genre and a rating of how much action the movie contains, given a scale of 1 (no action) to 5 (lot of action). For this, we make use of the `struct` and `struct_descr` parameters, which will result in structured output. ```sql --- Update the existing table to add new column for structured metadata ALTER TABLE my_db.movies ADD COLUMN metadata STRUCT(main_characters VARCHAR[], genre VARCHAR, action INTEGER); --- Populate the column with structured information UPDATE my_db.movies SET metadata = prompt( overview, struct:={main_characters: 'VARCHAR[]', genre: 'VARCHAR', action: 'INTEGER'}, struct_descr:={ main_characters: 'an array of the main character names mentioned in the movie description', genre: 'the primary genre of the movie based on the description', action: 'rate on a scale from 1 (no action) to 5 (high action) how much action the movie contains' } ); ``` The resulting `metadata` field is a `STRUCT` that can be accessed as follows: ```sql SELECT title, overview, metadata.main_characters, metadata.genre, metadata.action FROM my_db.movies; ``` | **title** | **overview** | **metadata.main_characters** | **metadata.genre** | **action** | |-----------|----------------------------------------------|-------------------------------------------------------------------------|------------------------------|------------| | Toy Story | Led by Woody, Andy's toys live happily [...] | ['"Woody"', '"Buzz Lightyear"', '"Andy"', '"Mr. Potato Head"', '"Rex"'] | Animation, Adventure, Comedy | 3 | | Jumanji | When siblings Judy and Peter discover [...] | ['"Judy Shepherd"', '"Peter Shepherd"', '"Alan Parrish"'] | Adventure, Fantasy, Family | 4 | | ... | ... | ... | ... | ... | #### Batch Processing The `prompt` function can process multiple rows in a single query: ```sql --- Process multiple rows at once SELECT title, prompt('Write a tagline for this movie: ' || overview) AS tagline FROM kaggle.movies LIMIT 10; ``` #### Retrieval-Augmented Generation (RAG) The `prompt` function can be combined with [similarity search on embeddings](/sql-reference/motherduck-sql-reference/ai-functions/embedding/#example-similarity-search) to build a [RAG](https://motherduck.com/blog/search-using-duckdb-part-2/) pipeline: ```sql -- Create a reusable macro for question answering CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE ( SELECT question_text AS question, prompt( 'User asks the following question:\n' || question_text || '\n\n' || 'Here is some additional information:\n' || STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' || 'Please answer the question based only on the additional information provided.', model := 'gpt-4o' ) AS response FROM ( SELECT title, overview FROM kaggle.movies ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC LIMIT 3 ) ); -- Use the macro to answer questions SELECT question, response FROM ask_question('Can you recommend some good sci-fi movies about AI?'); ``` This will result in the following output: | **question** | **response** | |-----------------------------------------------------|-----------------------------------------------------------------------------------| | Can you recommend some good sci-fi movies about AI? | Based on the information provided, here are some sci-fi movies about AI that you might enjoy: [...] | :::warning When passing free-text arguments from external sources to the prompt function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection. ::: Using prepared statements in [Python](/getting-started/connect-query-from-python/query-data/): ```python # First register the macro con.execute(""" CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE ( -- Macro definition as above ); """) # Then use prepared statements for user input user_query = "Can you recommend some good sci-fi movies about AI?" result = con.execute(""" SELECT response FROM ask_question(?) """, [user_query]).fetchone() print(result[0]) ``` ### Error Handling When usage limits have been reached or an unexpected error occurs while computing prompt responses, the function will not fail the entire query but will return `NULL` values for the affected rows. To check if all responses were computed successfully, check if any values in the resulting column are null. ```sql -- Check for NULL values in response column SELECT count(*) FROM my_db.movies WHERE response IS NULL AND overview IS NOT NULL; ``` Missing values can be filled in with a separate query: ```sql -- Fill in missing prompt responses UPDATE my_db.movies SET response = prompt('Summarize this movie description in one sentence: ' || overview) WHERE response IS NULL AND overview IS NOT NULL; ``` ### Performance Considerations - **Batch Processing**: When processing multiple rows, consider using `LIMIT` to control the number of API calls. - **Model Selection**: Use `gpt-4o-mini` for faster, less expensive responses when high accuracy isn't critical. - **Caching**: Results are not cached between queries, so consider storing results in tables for repeated use. ### Notes These capabilities are provided by MotherDuck's integration with OpenAI. Inputs to the prompt function will be processed by OpenAI. For availability and usage limits, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If higher usage limits are needed, please reach out directly to the [Slack support channel](https://slack.motherduck.com/) or email support@motherduck.com. --- --- sidebar_position: 0.9 title: PROMPT_EXPLAIN --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Explain a query The `prompt_explain` table function allows MotherDuck AI to analyze and explain SQL queries in plain English. This feature helps you understand complex queries, verify that a query does what you intend, and learn SQL concepts through practical examples. This function is particularly useful for understanding queries written by others or for automatically documenting your own queries for future reference. ### Syntax ```sql CALL prompt_explain('', [include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `query` | Yes | The SQL query to explain | | `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database). Can also be a dictionary in the format `{'table_name': ['column1', 'column2']}` to specify which columns to include for each table. | ### Example usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). #### Explaining a complex query ```sql CALL prompt_explain(' SELECT COUNT(*) as domain_count, SUBSTRING(SPLIT_PART(url, ''//'', 2), 1, POSITION(''/'' IN SPLIT_PART(url, ''//'', 2)) - 1) as domain FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10; '); ``` **Output**: when you run a `prompt_explain` query, you'll receive a single-column table with a detailed explanation: | **explanation** | |-----------------| |The query retrieves the top 10 most frequent domains from the `url` field in the `hn.hacker_news` table. It counts the occurrences of each domain by extracting the domain part from the URL (after the '//' and before the next '/'), groups the results by domain, and orders them in descending order of their count. The result includes the count of occurrences (`domain_count`) and the domain name itself (`domain`). | #### Using dictionary format for include_tables You can specify which columns to include for each table using the dictionary format: ```sql CALL prompt_explain(' SELECT u.id, u.name, COUNT(s.id) AS story_count FROM hn.users u LEFT JOIN hn.stories s ON u.id = s.user_id GROUP BY u.id, u.name HAVING COUNT(s.id) > 5 ORDER BY story_count DESC LIMIT 20; ', include_tables={'hn.users': ['id', 'name'], 'hn.stories': ['id', 'user_id']}); ``` This approach allows you to focus the explanation on only the relevant columns, which can be helpful for tables with many columns. #### How it works The `prompt_explain` function processes your query in several steps: 1. **Parsing**: analyzes the SQL syntax to understand the query structure 2. **Schema analysis**: examines the referenced tables and columns to understand the data model 3. **Operation analysis**: identifies the operations being performed (filtering, joining, aggregating, etc.) 4. **Translation**: converts the technical SQL into a clear, human-readable explanation 5. **Context addition**: adds relevant context about the purpose and expected results of the query ### Best practices For the best results with `prompt_explain`: 1. **Provide complete queries**: include all parts of the query for the most accurate explanation 2. **Use table aliases consistently**: this helps the function understand table relationships 3. **Specify relevant tables**: use the `include_tables` parameter for large databases 4. **Review explanations**: verify that the explanation matches your understanding of the query 5. **Use for documentation**: save explanations as comments in your code for future reference ### Notes MotherDuck AI operates on your current database by evaluating the schemas and contents of the database. You can specify which tables and columns should be considered using the optional `include_tables` [parameter](../prompt-sql/#include-tables-parameter). By default, all tables in the current database are considered. To point MotherDuck AI at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). These capabilities are provided by MotherDuck's integration with OpenAI. For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If you need higher usage limits or have specific requirements, please reach out to the [Slack support channel](https://slack.motherduck.com/) or email [support@motherduck.com](mailto:support@motherduck.com). --- --- sidebar_position: 0.9 title: PROMPT_FIX_LINE --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Fix your query line-by-line The `prompt_fix_line` table function allows MotherDuck AI to correct specific lines in your SQL queries that contain syntax or spelling errors. Unlike [`prompt_fixup`](../prompt-fixup), which rewrites the entire query, this function targets only the problematic lines, making it faster and more precise for localized errors. This function is ideal for fixing minor syntax errors in large queries where you want to preserve most of the original query structure and formatting. ### Syntax ```sql CALL prompt_fix_line('', error='', [include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `query` | Yes | The SQL query that needs correction | | `error` | No | The error message from the SQL parser (helps identify the problematic line) | | `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database) | ### Example usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). #### Fixing simple syntax errors ```sql -- Fixing a misspelled keyword with error message CALL prompt_fix_line('SEELECT COUNT(*) as domain_count FROM hn.hackers', error=' Parser Error: syntax error at or near "SEELECT" LINE 1: SEELECT COUNT(*) as domain_count FROM h... ^'); -- Fixing a typo in a column name CALL prompt_fix_line('SELECT user_id, titlee, score FROM hn.stories LIMIT 10'); -- Fixing incorrect operator usage CALL prompt_fix_line('SELECT * FROM hn.stories WHERE score => 100'); ``` #### Fixing errors in multi-line queries ```sql -- Fixing a specific line in a complex query CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FRUM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: syntax error at or near "FRUM" LINE 5: FRUM hn.stories ^'); ``` ### Example output When you run a `prompt_fix_line` query, you'll receive a two-column table with the line number and corrected content: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 1 | SELECT COUNT(*) as domain_count FROM hn.hackers | For multi-line queries, only the problematic line is corrected: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 5 | FROM hn.stories | #### How it works The `prompt_fix_line` function processes your query in a targeted way: 1. **Error localization**: uses the error message (if provided) to identify the specific line with issues 2. **Context analysis**: examines surrounding lines to understand the query's structure and intent 3. **Targeted correction**: fixes only the problematic line while preserving the rest of the query 4. **Line replacement**: returns the corrected line with its line number for easy integration For example, when fixing a syntax error in a single line: ```sql CALL prompt_fix_line('SEELECT COUNT(*) as domain_count FROM hn.hackers', error=' Parser Error: syntax error at or near "SEELECT" LINE 1: SEELECT COUNT(*) as domain_count FROM h... ^'); ``` The function will focus only on line 1, correcting the misspelled keyword: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 1 | SELECT COUNT(*) as domain_count FROM hn.hackers | For multi-line queries with an error on a specific line: ```sql CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FRUM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: syntax error at or near "FRUM" LINE 5: FRUM hn.stories ^'); ``` The function will only correct line 5, leaving the rest of the query untouched: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 5 | FROM hn.stories | This allows you to apply the fix by replacing just the problematic line in your original query, which is especially valuable for large, complex queries where a complete rewrite would be disruptive. When multiple errors exist, you would run `prompt_fix_line` multiple times, fixing one line at a time: ```sql -- First fix CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FRUM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: syntax error at or near "FRUM" LINE 5: FRUM hn.stories ^'); -- After applying the first fix, run again for the second error CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FROM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: column "scor" does not exist LINE 4: AVG(scor) AS average_score ^'); ``` The second call would return: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 4 | AVG(score) AS average_score | Note: you need to run `prompt_fix_line` multiple times to fix all errors. ### Best practices For the best results with `prompt_fix_line`: 1. **Include the error message**: the parser error helps pinpoint the exact issue 2. **Preserve query structure**: use this function when you want to maintain most of your original query 3. **Fix one error at a time**: to address multiple errors, run `prompt_fix_line` multiple times 4. **Include context**: provide the complete query, not just the problematic line 5. **Be specific with table names**: use the `include_tables` parameter for large databases ### Limitations While `prompt_fix_line` is efficient, be aware of these limitations: - Only fixes syntax errors, not logical errors in query structure - Accurate error messages help identify the problematic line and improve output - May not be able to fix errors that span multiple lines - Cannot fix issues related to missing tables or columns in your database - Works best with standard SQL patterns and common table structures ### Troubleshooting If you're not getting the expected results: - Ensure you've included the complete error message - Check that the line numbers in the error message match your query - For complex errors, try using `prompt_fixup` instead - If multiple lines need fixing, address them one at a time - Verify that your database schema is accessible to the function ### Notes MotherDuck AI operates on your current database by evaluating the schemas and contents of the database. You can specify which tables and columns should be considered using the optional `include_tables` [parameter](../prompt-sql/#include-tables-parameter). By default, all tables in the current database are considered. To point MotherDuck AI at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). These capabilities are provided by MotherDuck's integration with OpenAI. For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If you need higher usage limits or have specific requirements, please reach out to the [Slack support channel](https://slack.motherduck.com/) or email [support@motherduck.com](mailto:support@motherduck.com). --- --- sidebar_position: 0.9 title: PROMPT_FIXUP --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Fix up your query The `prompt_fixup` table function allows MotherDuck AI to correct and **completely rewrite** SQL queries that have logical or severe syntactical issues. This powerful feature analyzes your problematic query, identifies issues, and generates a corrected version that follows proper SQL syntax and semantics. For minor syntax errors or typos in large queries, consider using the [`prompt_fix_line`](../prompt-fix-line) function instead, which is faster and more precise as it only rewrites the problematic line. ### Syntax ```sql CALL prompt_fixup('', [include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `query` | Yes | The SQL query that needs correction | | `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database) | ### Example Usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). #### Fixing syntax errors ```sql -- Fixing misspelled keywords CALL prompt_fixup('SEELECT COUNT(*) as domain_count FROM hn.hackers'); -- Fixing incorrect table names CALL prompt_fixup('SELECT * FROM hn.stories WHERE score > 100 ODER BY score DESC'); -- Fixing missing clauses CALL prompt_fixup('SELECT AVG(score) hn.hacker_news GROUP score > 10'); ``` #### Fixing logical errors ```sql -- Fixing incorrect join syntax CALL prompt_fixup('SELECT u.name, s.title FROM hn.users u, hn.stories s WHERE u.id = s.user_id ORDER BY s.score'); -- Fixing aggregation issues CALL prompt_fixup('SELECT user_id, AVG(score) FROM hn.stories GROUP BY score'); -- Fixing complex query structure CALL prompt_fixup('SELECT COUNT(*) FROM hn.stories WHERE timestamp > "2020-01-01" AND timestamp < "2020-12-31" WITH score > 100'); ``` ### Example output When you run a `prompt_fixup` query, you'll receive a single-column table with the corrected SQL: | **query** | |-----------------| | SELECT COUNT(*) as domain_count FROM hn.hacker_news | #### How it works The `prompt_fixup` function processes your query in several steps: 1. **Analysis**: examines your query to identify syntax errors, logical issues, and structural problems 2. **Schema validation**: checks your query against the database schema to ensure table and column references are valid 3. **Correction**: applies fixes based on the identified issues and your likely intent 4. **Rewriting**: generates a complete, corrected version of your query that maintains your original goal For example, when fixing this query with multiple issues: ```sql CALL prompt_fixup('SEELECT AVG(scor) FRUM hn.stories WERE timestamp > "2020-01-01" GRUP BY user_id'); ``` The function will: - Correct misspelled keywords (`SEELECT` → `SELECT`, `FRUM` → `FROM`, `WERE` → `WHERE`, `GRUP` → `GROUP`) - Fix column name typos (`scor` → `score`) - Ensure proper clause ordering and syntax Resulting in a properly formatted query: | **query** | |-----------------| | SELECT AVG(score) FROM hn.stories WHERE timestamp > '2020-01-01' GROUP BY user_id | For logical errors, the process is similar but focuses on semantic correctness: ```sql CALL prompt_fixup('SELECT user_id, AVG(score) FROM hn.stories GROUP BY score'); ``` Will be corrected to: | **query** | |-----------------| | SELECT user_id, AVG(score) FROM hn.stories GROUP BY user_id | The function recognized that grouping should be by `user_id` (the non-aggregated column) rather than by `score` (which is being averaged). ### Best practices For the best results with `prompt_fixup`: 1. **Include the entire query**: even if only part of it has issues 2. **Be specific with table names**: use the `include_tables` parameter for large databases 3. **Review the fixed query**: always check that the corrected query matches your intent 4. **Use for complex issues**: prefer this function for logical errors or major syntax problems 5. **Consider alternatives**: for simple typos, `prompt_fix_line` may be more efficient ### Limitations While `prompt_fixup` is powerful, be aware of these limitations: - May change query logic if the original intent isn't clear - Performance depends on the complexity of your query - Works best with standard SQL patterns and common table structures - May not preserve exact formatting or comments from the original query - Cannot fix issues related to missing tables or columns in your database ### Troubleshooting If you're not getting the expected results: - Check that you've included all relevant tables in the `include_tables` parameter - Ensure your database schema is accessible to the function - For very complex queries, try breaking them into smaller parts - If the fixed query doesn't match your intent, try providing more context in comments ### Notes MotherDuck AI operates on your current database by evaluating the schemas and contents of the database. You can specify which tables and columns should be considered using the optional `include_tables` [parameter](../prompt-sql/#include-tables-parameter). By default, all tables in the current database are considered. To point MotherDuck AI at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). These capabilities are provided by MotherDuck's integration with OpenAI. For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If you need higher usage limits or have specific requirements, please reach out to the [Slack support channel](https://slack.motherduck.com/) or email [support@motherduck.com](mailto:support@motherduck.com). --- --- sidebar_position: 0.1 title: PROMPT_QUERY --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Answer questions about your data The `prompt_query` pragma allows you to ask questions about your data in natural language. This feature translates your plain English questions into SQL, executes the query, and returns the results. Under the hood, MotherDuck analyzes your database schema, generates appropriate SQL and executes the query on your behalf. This makes data exploration and analysis accessible to users of all technical levels. The `prompt_query` pragma is a read-only operation and does not allow queries that modify the database. ### Syntax ```sql PRAGMA prompt_query('') ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `question` | Yes | The natural language question about your data | ### Example usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). `prompt_query` can be used to answer both simple and complex questions. #### Basic questions ```sql -- Find the most shared domains PRAGMA prompt_query('what are the top domains being shared on hacker_news?') -- Analyze posting patterns PRAGMA prompt_query('what day of the week has the most posts?') -- Identify trends PRAGMA prompt_query('how has the number of posts changed over time?') ``` #### Complex questions ```sql -- Multi-part analysis PRAGMA prompt_query('what are the top 5 domains with the highest average score, and how many stories were posted from each?') -- Time-based analysis PRAGMA prompt_query('compare the average score of posts made during weekdays versus weekends') -- Conditional filtering PRAGMA prompt_query('which users have posted the most stories about artificial intelligence or machine learning?') ``` ### Best practices For the best results with `prompt_query`: 1. **Be specific**: clearly state what information you're looking for 2. **Provide context**: include relevant details about the data you want to analyze 3. **Use natural language**: phrase your questions as you would ask a data analyst 4. **Start simple**: begin with straightforward questions and build to more complex ones 5. **Refine iteratively**: if results aren't what you expected, try rephrasing your question ### Limitations While `prompt_query` is powerful, be aware of these limitations: - Only performs read operations (`SELECT` queries) - Works best with well-structured data with clear column names - Complex statistical analyses will likely require you (or an LLM) to write SQL - Performance depends on the complexity of your question and database size - May not understand highly domain-specific terminology without you giving more context ### Troubleshooting If you're not getting the expected results: - Check that you're connected to the correct database - Ensure your question is clear and specific - Try rephrasing your question using different terms - For complex analyses, break down into multiple simpler questions ### Notes MotherDuck AI operates on your current database by evaluating the schemas and contents of the database. To point MotherDuck AI at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume workloads and is always open to working alongside customers to support any type of requirement. These capabilities are provided by MotherDuck's integration with OpenAI. For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If you need higher usage limits or have specific requirements, please reach out to the [Slack support channel](https://slack.motherduck.com/) or email [support@motherduck.com](mailto:support@motherduck.com). --- --- sidebar_position: 0.9 title: PROMPT_SCHEMA --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Describe contents of a database The `prompt_schema` table function allows MotherDuck AI to analyze and describe the contents of your current database in plain English. This feature helps you understand the structure, purpose, and relationships between tables in your database without having to manually inspect each table's schema. This function is particularly useful when working with unfamiliar databases or when you need a high-level overview of a complex database structure. ### Syntax ```sql CALL prompt_schema([include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `include_tables` | No | Array of table names to consider for analysis (defaults to all tables in current database) | ### Example usage Here are several examples using MotherDuck's [sample data database](/getting-started/sample-data-queries/datasets). #### Describing the entire database ```sql CALL prompt_schema(); ``` #### Example output When you run a `prompt_schema` query, you'll receive a single-column table with a detailed description: | **summary** | |-----------------| | The database contains tables related to ambient air quality data, Stack Overflow survey results, NYC taxi and service requests, rideshare data, movie information with embeddings, and Hacker News articles, capturing a wide range of information from environmental metrics to user-generated content and transportation data. | #### Describing specific tables ```sql CALL prompt_schema(include_tables=['hn.hacker_news', 'hn.stories']); ``` | **summary** | |-----------------| | The database contains information about Hacker News posts, including details such as the title, URL, content, author, score, time of posting, type of post, and various identifiers and status flags. | #### How it works The `prompt_schema` function processes your database in several steps: 1. **Schema extraction**: examines the structure of tables, including column names and data types 2. **Data sampling**: analyzes sample data to understand the content and purpose of each table 3. **Relationship detection**: identifies potential relationships between tables based on column names and values 4. **Domain recognition**: categorizes tables into domains or subject areas based on their content 5. **Summary generation**: creates a human-readable description of the database structure and purpose ### Best practices For the best results with `prompt_schema`: 1. **Focus on relevant tables**: use the `include_tables` parameter to analyze specific parts of large databases 2. **Run on updated databases**: ensure your database is up-to-date for the most accurate description 3. **Use for documentation**: save the output as part of your database documentation 4. **Combine with other tools**: use alongside `DESCRIBE` and `SHOW` commands for complete understanding 5. **Share with team members**: use the output to help new team members understand the database structure ### Notes MotherDuck AI operates on your current database by evaluating the schemas and contents of the database. You can specify which tables and columns should be considered using the optional `include_tables` [parameter](../prompt-sql/#include-tables-parameter). By default, all tables in the current database are considered. To point MotherDuck AI at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). These capabilities are provided by MotherDuck's integration with OpenAI. For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If you need higher usage limits or have specific requirements, please reach out to the [Slack support channel](https://slack.motherduck.com/) or email [support@motherduck.com](mailto:support@motherduck.com). --- --- sidebar_position: 0.8 title: PROMPT_SQL --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Overview The `prompt_sql` function allows you to generate SQL queries using natural language. Simply describe what you want to analyze in plain English, and MotherDuck AI will translate your request into a valid SQL query based on your database schema and content. This function helps users who are less familiar with SQL syntax to generate queries and experienced SQL users save time when working with unfamiliar schemas. ## Syntax ```sql CALL prompt_sql(''[, include_tables=]); ``` ## Parameters | Parameter | Type | Description | Required | |-----------|------|-------------|----------| | `natural language question` | STRING | Your query in plain English describing the data you want to analyze | Yes | | `include_tables` | ARRAY or MAP | Specifies which tables and columns to consider for query generation. When not provided, all tables in the current database will be considered. | No | ### Include tables parameter You can specify which tables and columns should be considered during SQL generation using the `include_tables` parameter. This is particularly useful when: - You want to focus on specific tables in a large database - You want to improve performance by reducing the schema analysis scope The parameter accepts three formats: 1. **Array of table names**: include all columns from specified tables: ```sql include_tables=['table1', 'table2'] ``` 2. **Map of tables to columns**: include only specific columns from tables: ```sql include_tables={'table1': ['column1', 'column2'], 'table2': ['column3']} ``` 3. **Map with column regex patterns**: include columns matching patterns: ```sql include_tables={'table1': ['column_prefix.*', 'exact_column']} ``` ## Examples ### Basic example Let's start with a simple example using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news): ```sql CALL prompt_sql('what are the top domains being shared on hacker_news?'); ``` Output: | **query** | |-----------------| | SELECT regexp_extract(url, 'https?://([^/]+)') AS domain, COUNT(*) AS count FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY count DESC; | ### Intermediate example This example demonstrates how to generate a more complex query with filtering, aggregation, and time-based analysis: ```sql CALL prompt_sql('Show me the average score of stories posted by each author who has posted at least 5 stories in 2022, sorted by average score'); ``` Output: | **query** | |-----------------| | SELECT 'by', AVG(score) AS average_score FROM hn.hacker_news WHERE EXTRACT(YEAR FROM 'timestamp') = 2022 GROUP BY 'by' HAVING COUNT(id) >= 5 ORDER BY average_score; | ### Advanced Example: Multi-table Analysis with Specific Columns This example shows how to generate a query that focuses on specific columns: ```sql CALL prompt_sql( 'Find the top 10 users who submitted the most stories with the highest average scores in 2023', include_tables={ 'hn.hacker_news': ['id', 'by', 'score', 'timestamp', 'type', 'title'] } ); ``` Output: | **query** | |-----------------| | SELECT "by", AVG(score) AS avg_score, COUNT(*) AS story_count FROM hn.hacker_news WHERE "type" = 'story' AND EXTRACT(YEAR FROM "timestamp") = 2023 GROUP BY "by" ORDER BY story_count DESC, avg_score DESC LIMIT 10; | ### Expert example This example demonstrates generating a complex query with subqueries, window functions, and complex logic: ```sql CALL prompt_sql('For each month in 2022, show me the top 3 users who posted stories with the highest scores, and how their average score compares to the previous month'); ``` Output: | **query** | |-----------------| | WITH monthly_scores AS (
SELECT
"by" AS user,
DATE_TRUNC('month', "timestamp") AS month,
AVG(score) AS avg_score
FROM hn.hacker_news
WHERE "type" = 'story' AND DATE_PART('year', "timestamp") = 2022
GROUP BY user, month
),
... | ## Failure example This example shows that for some complex queries, the model might not generate a valid SQL query. Therefore the output will be the following error message: ```sql CALL prompt_sql('Identify the most discussed technology topics in Hacker News stories from the past year based on title keywords, and show which days of the week have the highest engagement for each topic'); ``` Output: | **query** | |-----------------| | Invalid Input Error: The AI could not generate valid SQL. Try re-running the command or rephrasing your question. | To generate a valid SQL query, you can try to break down the question into simpler parts. ## Best practices 1. **Be specific in your questions**: the more specific your natural language query, the more accurate the generated SQL will be. 2. **Start simple and iterate**: begin with basic queries and gradually add complexity as needed. 3. **Use the `include_tables` parameter**: when working with large databases, specify relevant tables to improve performance and accuracy. 4. **Review generated SQL**: always review the generated SQL before executing it, especially for complex queries. 5. **Understand your schema**: knowing your table structure helps you phrase questions that align with available data. 6. **Use domain-specific terminology**: include field names in your questions when possible. 7. **Provide context in your questions**: mention time periods, specific metrics, or business context to get more relevant results. ## Notes - By default, all tables in the current database are considered. Use the `include_tables` parameter to narrow the scope. - To target a specific database, first execute the `USE ` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). - The quality of generated SQL depends on the clarity of your natural language question and the quality of your database schema (table and column names). - This feature is powered by MotherDuck's integration with OpenAI's language models. ## Troubleshooting If you encounter issues with the `prompt_sql` function, consider the following troubleshooting steps: 1. **Check your database schema**: ensure that the tables and columns you're querying are present in the current database. 2. **Be specific in your questions**: the more specific your natural language query, the more accurate the generated SQL will be. 3. **Use the `include_tables` parameter**: when working with large databases, specify relevant tables to improve performance and accuracy. --- --- sidebar_position: 1 title: ATTACH --- # ATTACH \ A local database can be attached in order to access local data, and a remote MotherDuck database that a user has [created](create-database.md) and has previously [detached](detach-database.md) may be re-attached. To attach to a MotherDuck database, the `md:` prefix is used. ## Syntax ```sql ATTACH 'md:' ``` Parameters: * `database_name`: The name of the database to which to connect. If omitted, it defaults to 'workspace', which connects to all databases. ## Example of usage ```sql -- Connect to a specific MotherDuck database ATTACH 'md:'; -- Connect to all MotherDuck databases in the workspace: ATTACH 'md:'; -- Connect to a local database ATTACH '/path/to/my_database.duckdb'; ATTACH 'a_new_local_duckdb'; ``` ## Important Notes * Local database `ATTACH` operations: * Are temporary and last only for the current session * Data stays local and isn't uploaded to MotherDuck * Use file paths instead of share URLs * MotherDuck database `ATTACH` operations: * Are persistent, as they attach the database/share to your MotherDuck account * The database must have been created by the active user and must have already been detached. * If the remote database was not detached prior to running the `ATTACH` command, using the `md:` prefix will produce an error rather than creating a local database and attaching it. * For a remote MotherDuck database, the database name is used to indicate what to attach and no alias is permitted. ## Troubleshooting ### Handling name conflicts between local and remote databases In case of name conflict between a local database and a remote database, there are two possible paths: 1. Attach the local database with a different name using an alias with `AS`. For instance : `ATTACH 'my_db.db' AS my_new_name` 2. Create a share out of your remote database and attach it with an alias. Shares are read-only. --- --- sidebar_position: 1 title: ATTACH --- # ATTACH \ Sharing in MotherDuck is done through shares. Recipient of a share must `ATTACH` the share, which creates a read-only database. This is a zero-copy, zero-cost, metadata-only operation. [Learn more about sharing in MotherDuck](/key-tasks/sharing-data/sharing-overview.md). ## Syntax ```sql ATTACH [AS ]; ``` ### Shorthand Convention You may choose to name the new database by using `AS `. If you omit this clause, the new database will be given the same name as the source database that's being shared. ## Example usage ```sql ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81' AS birds; # attaches the share as database `birds` ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81'; # attaches the share as database `ducks` --- --- sidebar_position: 3 title: Identify client connection and DuckDB ID --- :::info This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. ::: # Identify client connection and DuckDB ID `md_current_client_connection_id` and `md_current_client_duckdb_id` are two scalar functions that can be used to identify the current `client_connection_id` and `client_duckdb_id`. ## Syntax ```sql SELECT md_current_client_connection_id(); SELECT md_current_client_duckdb_id(); ``` ## Example usage To [interrupt](documentation/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections.md) all server-side connections that are initiated by the current client DuckDB instance, we can use: ```sql SELECT md_interrupt_server_connection(client_connection_id) FROM md_active_server_connections() WHERE client_duckdb_id = md_current_client_duckdb_id() AND client_connection_id != md_current_client_connection_id(); ``` --- --- sidebar_position: 2 title: Interrupting active server connections --- :::info This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. ::: # Interrupting active server connections The `md_interrupt_server_connection` scalar function can be used to interrupt an active transaction on a server-side connection. This will interrupt and fail / rollback the active transaction (when executing for example a long-running query), but will allow the connection to be used for future transactions and queries. The function takes as input the `client_connection_id`, i.e. the unique identifier for the client DuckDB connection that initiated the server connection. ## Syntax ```sql SELECT md_interrupt_server_connection(); ``` ## Example usage Interrupting a specific connection: ```sql SELECT md_interrupt_server_connection('2601e799-51b3-47a7-a64f-18688d148887'); ``` Using `md_interrupt_server_connection` in conjunction with [`md_active_server_connections`](documentation/sql-reference/motherduck-sql-reference/connection-management/monitor-connections.md) to interrupt a subset or all of the currently active connections: ```sql -- Interrupt all connections where a `CREATE TABLE` query is running SELECT md_interrupt_server_connection(client_connection_id) FROM md_active_server_connections() WHERE starts_with(client_query, 'CREATE TABLE'); ``` --- --- sidebar_position: 1 title: Monitoring active server connections --- :::info This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. ::: # Monitoring active server connections The `md_active_server_connections` table function can be used to list all server-side connections that have active transactions. ## Syntax ```sql FROM md_active_server_connections(); ``` This returns a list of active server connections, with the following information: | **column_name** | **column_type** | **description** | |---------------------------------|-----------------|----------------------------------------------------------------------------------| | client_duckdb_id | UUID | Unique identifier for the client DuckDB instance that initiated the connection | | client_connection_id | UUID | Unique identifier for the client DuckDB connection that initiated the connection | | client_transaction_id | UBIGINT | Identifier for the transaction within the current connection | | server_transaction_stage | VARCHAR | Stage the server-side transaction is in | | server_transaction_elapsed_time | INTERVAL | How long the server-side transaction has been in the current stage | | client_query_id | UBIGINT | Identifier for the query within the current transaction | | client_query | VARCHAR | Query string (possibly truncated) | | server_query_elapsed_time | INTERVAL | How long the query has been running on the server-side | | server_query_progress | DOUBLE | Progress information (value between 0.0 and 1.0) | | server_interrupt_elapsed_time | INTERVAL | How long the connection has been interrupted | | server_interrupt_reason | VARCHAR | Why the connection was interrupted | --- --- sidebar_position: 1 title: COPY FROM DATABASE --- # COPY FROM DATABASE The `COPY FROM DATABASE` statement creates a new database from an existing one. It can be used to: - [Interact with MotherDuck Databases](#copy-a-motherduck-database-to-a-motherduck-database) - Copy MotherDuck databases to MotherDuck databases - [Interact with Local Databases](#interacting-with-local-databases) - Copy local databases to MotherDuck databases - Copy MotherDuck databases to local databases - Copy local databases to local databases # Syntax The `COPY FROM DATABASE` statement has the following syntax: ```sql COPY FROM DATABASE TO [ (SCHEMA) ] ``` ## Example usage ### Copy a MotherDuck database to a MotherDuck database This is the same as [creating a new database from an existing one](/sql-reference/motherduck-sql-reference/create-database.md). ```sql COPY FROM DATABASE my_db TO my_db_copy; ``` ### Interacting with Local Databases These operations can be done with access to the local filesystem, i.e. inside the DuckDB CLI. #### Copy a local database to a MotherDuck database ```sql ATTACH 'md:'; COPY FROM DATABASE local_database.db TO md:md_database; ``` #### Copy a MotherDuck database to a local database To copy a MotherDuck database to a local database requires some extra steps. ```sql ATTACH 'md:'; ATTACH 'local_database.db' as local_db; COPY FROM DATABASE my_db TO local_db; ``` #### Copy a local database to a local database To copy a local database to a local database, please see the [DuckDB documentation](https://duckdb.org/docs/stable/sql/statements/copy.html#copy-from-database--to). ### Copying the Database Schema ```sql COPY FROM DATABASE my_db TO my_db_copy (SCHEMA); ``` This will copy the schema of the database, but not the data. --- --- sidebar_position: 1 title: CREATE DATABASE --- # CREATE DATABASE The `CREATE DATABASE` statement creates a new database in MotherDuck. In addition to creating a database, it can be used to copy entire databases into MotherDuck from your local environment, or to zero-copy clone databases inside of MotherDuck. `CREATE DATABASE FROM` enables you to: * create a **remote database** (on MotherDuck) from a **local database.** * create a **remote database** (on MotherDuck) from a **remote database.**. This is a metadata-only operation that copies no data. :::note To copy a MotherDuck database to a local database, use the [`COPY FROM DATABASE`](/sql-reference/motherduck-sql-reference/copy-database.md) statement. ::: # Syntax This SQL query creates a new database in MotherDuck. ```sql CREATE [ OR REPLACE ] DATABASE [ IF NOT EXISTS ] [ FROM CURRENT_DATABASE() | FROM | FROM '' | FROM "md:_share/.../" ]; ``` You can also pass the name of an attached share or a share url as `database name`, for example `CREATE DATABASE FROM my_share` or `CREATE DATABASE FROM "md:_share/..."`. If you attempt to create a database, yet a database with that name already exists, no new database will be created and the query will return an error. The error will be silenced when you specify `IF NOT EXISTS`. Similar to the DuckDB table name conventions, database names that start with a number or contain special characters must be double quoted when used. Example: `CREATE DATABASE "123db"` Creating a database does not automatically make it your current database: execute the SQL command `USE DATABASE ` to do so. # Example usage ```sql CREATE OR REPLACE DATABASE ducks; -- if ducks database exists, even if populated, it will be replaced with an empty one CREATE DATABASE IF NOT EXISTS ducks_db; -- if ducks_db database exists, the operation will be skipped, but will not error ``` To *copy* an entire database from your local DuckDB instance into MotherDuck: ```sql USE ducks_db; CREATE DATABASE ducks FROM CURRENT_DATABASE(); CREATE OR REPLACE DATABASE ducks FROM ducks_db; -- alternative syntax ``` To zero-copy clone a database in MotherDuck ```sql CREATE DATABASE cloud_db FROM another_cloud_db; ``` To upload a local DuckDB database file ```sql CREATE DATABASE flying_ducks FROM './databases/local_ducks.db'; ``` To upload an attached local DuckDB database ``` ATTACH './databases/local_ducks.db'; CREATE DATABASE flying_ducks FROM local_ducks; ``` --- --- sidebar_position: 1 title: CREATE SECRET --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # CREATE SECRET MotherDuck enables you to store your cloud storage credentials for convenience, using the familiar DuckDB `CREATE SECRET` syntax. See [DuckDB CREATE SECRET documentation](https://duckdb.org/docs/sql/statements/create_secret.html). Make sure to add either `PERSISTENT` or `IN MOTHERDUCK` keyword to create MotherDuck secrets. Secrets stored in MotherDuck are fully encrypted. :::note You can use the `PERSISTENT` keyword to create a local file persistent secret in DuckDB as well. It gets stored unencrypted in the `~/.duckdb/stored_secrets` directory. When you've loaded the MotherDuck extension, `PERSISTENT` secrets are stored encrypted in MotherDuck. Locally persisted secrets are not impacted. You can still create locally persisted secrets when using MotherDuck by specifying the secret storage backend: `CREATE SECRET IN LOCAL_FILE`. ::: When using MotherDuck, the statement below creates a cloud-persistent secret stored in MotherDuck. # Syntax ```sql CREATE [OR REPLACE] PERSISTENT SECRET [secret_name] ( TYPE , ); ``` ```sql CREATE [OR REPLACE] SECRET [secret_name] IN MOTHERDUCK ( TYPE , ); ``` # Example Usage To manually create a S3 secret in MotherDuck: ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 's3_access_key', SECRET 's3_secret_key', REGION 'us-east-1' ); ``` This creates a new secret with a default name (i.e. `__default_s3`) and a default scope (i.e. `[s3://, s3n://, s3a://]`) used for path matching explained below. ## Secret Providers MotherDuck supports the same [secret providers](https://duckdb.org/docs/configuration/secrets_manager.html#secret-providers) as DuckDB. To create a secret by automatically fetching credentials using mechanisms provided by the AWS SDK, see [AWS CREDENTIAL_CHAIN provider](https://duckdb.org/docs/extensions/httpfs/s3api#credential_chain-provider). To create a secret by automatically fetching credentials using mechanisms provided by the Azure SDK, see [Azure CREDENTIAL_CHAIN provider](https://duckdb.org/docs/extensions/azure#credential_chain-provider). To create a secret by automatically fetching credentials using mechanisms provided by the Hugging Face CLI, see [Hugging Face CREDENTIAL_CHAIN provider](https://duckdb.org/docs/extensions/httpfs/hugging_face#authentication). To store a secret from a given secret provider in MotherDuck, simply specify `PERSISTENT` or `IN MOTHERDUCK` keyword in addition. # Example Usage To store a secret configured through `aws configure`: ```sql CREATE PERSISTENT SECRET aws_secret ( TYPE S3, PROVIDER CREDENTIAL_CHAIN ); ``` To store a secret configured through `az configure`: ```sql CREATE SECRET azure_secret IN MOTHERDUCK ( TYPE AZURE, PROVIDER CREDENTIAL_CHAIN, ACCOUNT_NAME 'some-account' ); ``` ## Querying with Secrets [Secret scope](https://duckdb.org/docs/configuration/secrets_manager.html#creating-multiple-secrets-for-the-same-service-type) is supported in the same way as in DuckDB to allow multiple secrets of the same type to be stored in MotherDuck. When there are multiple local (i.e. in memory and store in local file) and remote (i.e. MotherDuck) secrets of the same type, scope matching (secret scope against the file path) happens to determine which secret to use to open a file. Both local and remote secrets are considered in scope matching. In the case of multiple matching secrets, the secret with the longest matching scope prefix is chosen. In the case of multiple secrets stored in different secret storages sharing the same scope (e.g. the default scope if not specified), matching secret is chosen based on the following order: local temp secret > local_file secret > MotherDuck secret. To see which secret (either local or remote) is being used by MotherDuck, the DuckDB `which_secret` table function can be used, which takes a path and the secret type. # Example Usage To see which secret is used to open a file: ```sql FROM which_secret('s3://my-bucket/my_dataset.parquet', 's3'); ┌───────────────────────┬────────────┬────────────┐ │ name │ persistent │ storage │ │ varchar │ varchar │ varchar │ ├───────────────────────┼────────────┼────────────┤ │ __default_s3 │ PERSISTENT │ motherduck │ └───────────────────────┴────────────┴────────────┘ ``` --- --- sidebar_position: 1 title: CREATE SHARE --- # CREATE SHARE The `CREATE SHARE` statement creates a new a share from a database. This command is used to share databases with other users. [Learn more about sharing in MotherDuck](/key-tasks/sharing-data/sharing-overview.md). ## Syntax ```sql CREATE [ OR REPLACE ] SHARE [ IF NOT EXISTS ] [] [FROM ] ([ACCESS ORGANIZATION | UNRESTRICTED | RESTRICTED] , [VISIBILITY DISCOVERABLE | HIDDEN], [UPDATE AUTOMATIC | MANUAL]); ``` If you attempt to create a share, yet a share with that name already exists, no new share will be created and the query will return an error. The error will be silenced when you specify `IF NOT EXISTS`. This statement returns a share URL of the form `md:_share//`. - If the share is **Hidden**, you must pass this URL to the **data consumer**, who will need to [`ATTACH`](attach-share.md) the share. - If the share is **Discoverable**, passing the URL to the **data consumer** is optional. ### _ACCESS_ Clause You can configure scope of access of the share: - `ACCESS ORGANIZATION` (default) - only members of your Organization can access the share. - `ACCESS UNRESTRICTED` - all MotherDuck users in all Organizations can access the share. - `ACCESS RESTRICTED` - the share owner will be the only user with access to the share initially. Access for other users the share can be updated via the [`GRANT`](grant-access.md) and [`REVOKE`](revoke-access.md) commands. If omitted, defaults to `ACCESS ORGANIZATION`. ### _VISIBILITY_ Clause For Organization scoped shares **only**, you may choose to make them Discoverable: - `VISIBILITY DISCOVERABLE` (default) - all members of your Organization will be able to list/find the share in the UI or SQL. - `VISIBILITY HIDDEN` - the share can only be accessed directly by the share URL, and is not listed. If omitted, Organization-scoped and Restricted shares default to `VISIBILITY DISCOVERABLE`. Unrestricted shares can only be **Hidden**. ### _UPDATE_ Clause Shares can be automatically or manually updated by the share creator. - `UPDATE MANUAL` (default) - shares are only updated via the [`UPDATE SHARE`](update-share.md) command. - `UPDATE AUTOMATIC` - the share is automatically updated when the underlying database changes. Typically changes on the underlying database will automatically be published to the share within at most 5 minutes, after writes have completed. Ongoing overlapping writes may prolong share updating. If omitted, defaults to `UPDATE MANUAL`. ### Shorthand Convention - If the database name is omitted, a share will be created from the current/active database. - If the share name is omitted, the share will be named after the source database. - If both database and share names are omitted, the share will be named and created after the current/active database. ## Example Usage ```sql -- If ducks_share exists, it will be replaced with a new share. A new share URL is returned. CREATE OR REPLACE SHARE ducks_share; -- If ducks_share exists, nothing is done. Its existing share URL is returned. Otherwise, a new share is created and its share URL is returned. CREATE SHARE IF NOT EXISTS ducks_share; ``` ```sql USE mydb; CREATE SHARE; # creates an Organization-scoped, Discoverable share named `mydb` CREATE SHARE FROM db2; # creates an Organization-scoped, Discoverable share named 'db2' CREATE SHARE birds FROM birds (ACCESS ORGANIZATION, VISIBILITY HIDDEN, UPDATE AUTOMATIC); ``` :::note All shares created prior to June 6, 2024 are Unrestricted and Hidden. To make these legacy shares Organization-scoped and Discoverable, you can alter them in the UI or delete and create new shares. Users of DuckDB below version 1.1.1 do not have access to the `UPDATE` option. ::: --- --- sidebar_position: 1 title: CREATE SNAPSHOT --- # CREATE SNAPSHOT `CREATE SNAPSHOT OF ` creates a new read-only snapshot of the specified database for read-scaling ducklings. Only one database can be snapshotted per command. In the background, a snapshot of each database is taken every minute to sync changes with read-scaling ducklings. If writing queries are active on a database, the snapshot is skipped to avoid disruption. To force a snapshot, run `CREATE SNAPSHOT` manually. This command will wait on any ongoing write queries on the database to complete, and prevent new ones from starting. As soon as all ongoing write queries are completed, the command create the snapshot, ensuring that read-scaling connections can access the most up-to-date data. Read-scaling instance picks up the latest available snapshot every minute. To minimize delay and ensure access to the latest data, use `CREATE SNAPSHOT` on the writer connection, followed by a `REFRESH DATABASE ` on the read scaling connection. ```sql CREATE SNAPSHOT OF ; ``` Lean more about [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md). --- --- sidebar_position: 1 title: DROP SECRET --- # DROP SECRET The DuckDB `DROP SECRET` statement (see DuckDB [DROP SECRET documentation](https://duckdb.org/docs/sql/statements/create_secret#syntax-for-drop-secret)) works in MotherDuck to delete the secret previously created with `CREATE SECRET` statement. # Syntax ```sql DROP SECRET ; ``` When there are multiple secrets with the same name stored in different secret storages (e.g. in memory vs. in MotherDuck), either the persistent type or the secret storage type needs to be specified to remove ambiguity when dropping the secret. # Example Usage Disambiguate by specifying the storage type when dropping a secret: ```sql DROP SECRET __default_s3 FROM motherduck; ``` Disambiguate by specifying the persistence type when dropping a secret: ```sql DROP PERSISTENT SECRET __default_s3; ``` --- --- sidebar_position: 1 title: DESCRIBE SHARE --- # DESCRIBE SHARE The `DESCRIBE SHARE` statement is used to get details about a specific share. :::info The **creator** of the share object can execute this statement by passing the **share name**. The **receiver** of the share object can execute this statement by passing the **share link**. ::: # Syntax ```sql DESCRIBE SHARE [ | ]; ``` # Example Let's use the `sample_data` database which is auto attached to MotherDuck users to illustrate the command ```sql DESCRIBE SHARE 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'; ``` It returns a table with the following columns: | column_name | column_type | description | |---------------| ----------- |-----------------------------| | name | VARCHAR | Name of the share | | url | VARCHAR | URL of the share | | source_db_name | VARCHAR | Name of the database shared | | source_db_uuid | UUID | uid of the database shared | | access | VARCHAR | Whether anyone (referred to as UNRESTRICTED) or only organization members (referred to as ORGANIZATION) can attach to the share by its share_url | | visibility | VARCHAR | Whether the share is DISCOVERABLE or HIDDEN | | update | VARCHAR | The share’s update mode (MANUAL vs. AUTOMATIC) | | created_ts | TIMESTAMP WITH TIME ZONE | The share’s creation time | You can be specific about what which columns you want to return by using the table function : ```sql SELECT name, url, source_db_name FROM md_describe_database_share('md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'); ``` --- --- sidebar_position: 1 title: DETACH --- # DETACH \ After a database has been created, it can be detached. This will prevent queries from accessing or modifying that database while it is detached. This command may be used on both local DuckDB databases and remote MotherDuck databases. For a local database, specify the name of the database to detach and not the full path. In the case of a remote MotherDuck database, the [`ATTACH `](attach-database.md) command can be used to re-attach at any point, so this is designed to be a convenience feature, not a security feature. `DETACH` can be used to isolate work on specific databases, while preserving the contents of the detached databases. To see all databases, both attached and detached, use the [`SHOW ALL DATABASES` command](show-databases.md). # Syntax ```sql DETACH ; ``` # Example usage ```sql -- Prior command: -- ATTACH '/path/to/local_database.duckdb'; DETACH local_database; -- Prior command: -- CREATE DATABASE my_md_database; DETACH my_md_database; ``` --- --- sidebar_position: 1 title: DETACH --- # DETACH \ Attached shares are sticky, and will continue to appear in your catalog unless you explicitly detach them with: # Syntax ```sql DETACH ; ``` # Example usage ```sql DETACH ducks; ``` --- --- sidebar_position: 1 title: DROP DATABASE --- # DROP DATABASE The `DROP` statement removes a database entry added previously with the `CREATE` command. By default (or if the `RESTRICT` clause is provided), the entry will not be dropped if there are any existing database shares that were created from it. If the `CASCADE` clause is provided then all the shares that are dependent on the database will be dropped as well. # Syntax ```sql DROP DATABASE [IF EXISTS] [CASCADE | RESTRICT]; ``` # Example usage ```sql DROP DATABASE ducks; -- drops database named `ducks` DROP DATABASE ducks CASCADE; -- drops database named `ducks` and all the shares created from `ducks` ``` --- --- sidebar_position: 1 title: DROP SHARE --- # DROP SHARE `DROP SHARE` is used to delete a share by the share creator. Users who have attached the share will lose access. This will throw an error if the share does not exist. `DROP SHARE IF EXISTS` is used to delete a share by the share creator and will not throw an error if the share does not exist. # Syntax ```sql DROP SHARE ""; DROP SHARE IF EXISTS ""; ``` --- --- sidebar_position: 1 title: GRANT READ ON SHARE --- # GRANT READ ON SHARE For restricted shares, use the `GRANT` command to explicitly give users access to the share. After a user has been `GRANT`-ed access they will still need to run an `ATTACH` command to be able to run queries against the shared database. Only the owner of the share can use the `GRANT` command to give access to others. ## Syntax ```sql GRANT READ ON SHARE TO [, , ]; ``` ## Example usage ```sql GRANT READ ON SHARE birds TO duck; # gives the user with username 'duck' access to the share 'birds' GRANT READ ON SHARE taxis TO usr1, usr2; # gives the users with usernames 'usr1' and 'usr2' access to the share 'taxis' --- --- sidebar_position: 1 title: LIST SECRETS --- # LIST SECRETS Secrets can be listed in the same way as in DuckDB by using the table function `duckdb_secrets()`. # Syntax ```sql FROM duckdb_secrets(); ``` | name | type | provider | persistent | storage | scope | secret_string | |-----------------|-------|------------------|------------|------------|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | __default_azure | azure | credential_chain | false | memory | [azure://, az://] | name=__default_azure;type=azure;provider=credential_chain;serializable=true;scope=azure://,az://;account_name=some-account | | __default_s3 | s3 | credential_chain | false | memory | [s3://, s3n://, s3a://] | name=__default_s3;type=s3;provider=credential_chain;serializable=true;scope=s3://,s3n://,s3a://;endpoint=s3.amazonaws.com;key_id=AKIA3N4CIOGGCVTOBT4J;region=us-east-1;secret=redacted;session_token=redacted | | __default_r2 | r2 | config | true | motherduck | [r2://] | name=__default_r2;type=r2;provider=config;serializable=true;scope=r2://;endpoint=my_account.r2.cloudflarestorage.com;key_id=my_key;region=us-east-1;s3_url_compatibility_mode=0;secret=redacted;session_token=redacted;url_style=path;use_ssl=1 | | __default_gcs | gcs | config | true | motherduck | [gcs://, gs://] | name=__default_gcs;type=gcs;provider=config;serializable=true;scope=gcs://,gs://;endpoint=storage.googleapis.com;key_id=my_key;region=us-east-1;s3_url_compatibility_mode=0;secret=redacted;session_token=redacted;url_style=path;use_ssl=1 | :::note DuckDB allows you to specify `redact` when listing secrets (it's set to `true` by default). However, MotherDuck secrets are always redacted for security reasons despite the flag. ::: # Example Usage To inspect specific field(s) in `secret_string`: ```sql select name, storage, list_filter(split(secret_string,';'), x -> starts_with(x, 'region'))[1] from duckdb_secrets(redact=false) where name='__default_s3'; ``` --- --- sidebar_position: 1 title: LIST SHARES --- # LIST SHARES The `LIST SHARES` statement is used to list all shares that you've created. You can also use the table function `md_list_database_shares()`. :::tip Looking for all the shares that you have access to? Take a look at the [md_information_schema](/sql-reference/motherduck-sql-reference/md_information_schema/shared_with_me/). ::: # Syntax ```sql -- using DDL LIST SHARES; -- using table function SELECT name, url, source_db_name FROM md_list_database_shares(); ``` --- --- sidebar_position: 1 title: MD_RUN parameter --- # MD_RUN parameter For certain DuckDB **Table Functions**, MotherDuck now provides an additional parameter, `MD_RUN` that gives explicit control over where the query is executed. This parameter is available to the following functions: - `read_csv()` - `read_csv_auto()` - `read_json()` - `read_json_auto()` - `read_parquet()` and its alias `parquet_scan()` To leverage the MD_RUN parameter, you can choose: - `MD_RUN=LOCAL` executes the function in your local DuckDB environment - `MD_RUN=REMOTE` executes the function in MotherDuck-hosted DuckDB runtimes in the cloud - `MD_RUN=AUTO` executes remotely all s3://, http://, and https:// requests, except those to localhost/127.0.01. This is the default option. The following is an example of evoking this parameter to execute the function remotely: ```sql SELECT * FROM read_csv_auto( 'https://github.com/duckdb/duckdb/raw/main/data/csv/ips.csv.gz', MD_RUN=REMOTE) LIMIT 100 ``` In this example `MD_RUN=REMOTE` is redundant, because omitting it implies `MD_RUN=AUTO` and given that this is a non-local https:// resource, MotherDuck will automatically chose remote execution already. One can force local execution with `MD_RUN=LOCAL`. Be aware that DuckDB-WASM does not support reading compressed files yet, so inside the Web Browser one would get an error for this particular file as it is ips.csv**.gz** (it does work locally from the CLI or e.g. a python notebook). --- --- sidebar_position: 1 title: DATABASES view --- # DATABASES view The `MD_INFORMATION_SCHEMA.DATABASES` view provides information about the current databases that the user created. ## Schema When you query the `MD_INFORMATION_SCHEMA.DATABASES` view, the query results contain one row for each database that the current user created. The `MD_INFORMATION_SCHEMA.DATABASES` view has the following schema: | Column Name | Data Type | Value | |-------------|-----------|-----------------------------------| | NAME | STRING | The name or alias of the database | | UUID | STRING | The UUID of the database | | CREATED_TS | TIMESTAMP | The database’s creation time | ## Example usage ```sql from MD_INFORMATION_SCHEMA.DATABASES; ``` | name | uuid | created_ts | |----------------------|--------------------------------------|------------------------| | tpch_sf1000_template | 2c80b37d-d307-44d8-aff6-33ea2294bd35 | 2024-10-21 14:26:30-04 | | db1 | 445864c7-5758-42a2-9a5c-2f16620ebc9f | 2024-09-15 09:32:05-04 | | foo | 4d829a9e-e0da-408c-aafa-0fc50186a588 | 2024-09-03 13:32:10-04 | | tpch_sf1000 | fc4bf9f4-80d1-4fd9-b6fe-d6d71f40ef42 | 2024-10-21 14:26:30-04 | --- --- title: Introduction to MD_INFORMATION_SCHEMA description: Introduction to MD_INFORMATION_SCHEMA --- # Introduction to MD_INFORMATION_SCHEMA The MotherDuck `MD_INFORMATION_SCHEMA` views are read-only, system-defined views that provide metadata information about your MotherDuck objects. The following table lists all `MD_INFORMATION_SCHEMA` views that you can query to retrieve metadata information: | Resource Type | MD_INFORMATION_SCHEMA View | |-----------------|----------------------------------| | Database | [DATABASES](databases.md) | | Database Shares | [OWNED_SHARES](owned_shares.md)
[SHARED_WITH_ME](shared_with_me.md)
| ## Example usage ```sql -- list all databases you created from md_information_schema.databases; -- list all shares you created from md_information_schema.owned_shares; -- select specific columns select name, url, access, visibility from md_information_schema.owned_shares; -- set md_information_schema as the current database use md_information_schema; -- list all the views in md_information_schema show tables; ``` --- --- sidebar_position: 2 title: OWNED_SHARES view --- # OWNED_SHARES view The `MD_INFORMATION_SCHEMA.OWNED_SHARES` view provides information about the shares that the current user created. ## Schema When you query the `MD_INFORMATION_SCHEMA.OWNED_SHARES` view, the query results contain one row for each share that the current user created. The `MD_INFORMATION_SCHEMA.OWNED_SHARES` view has the following schema: | Column Name | Data Type | Value | |-------------|-----------|-----------------------------------| | NAME | STRING | The name of the share | | URL | STRING | The share_url which can be used to attach the share | | SOURCE_DB_NAME | STRING | The name of the database where this share was created from | | SOURCE_DB_UUID | UUID | UUID of the database where this share was created from | | ACCESS | STRING | Whether anyone (referred to as UNRESTRICTED) or only organization members (referred to as ORGANIZATION) can attach to the share by its share_url | | GRANTS | STRUCT(username VARCHAR, access VARCHAR)[] | A list of all grants that are active for the share | | VISIBILITY | STRING | Whether the share is DISCOVERABLE or HIDDEN | | UPDATE | STRING | The share’s update mode (MANUAL vs. AUTOMATIC) | | CREATED_TS | TIMESTAMP | The share’s creation time | ## Example usage ```sql from MD_INFORMATION_SCHEMA.OWNED_SHARES; select name, url, created_ts from MD_INFORMATION_SCHEMA.OWNED_SHARES; ``` | name | url | source_db_name | |----------|---------------------------------------------------------|----------------| | my_share | md:_share/my_share/2ef6b580-2445-4f4f-bce8-c13a85812464 | db1 | --- --- sidebar_position: 3 title: SHARED_WITH_ME view --- # SHARED_WITH_ME view The `MD_INFORMATION_SCHEMA.SHARED_WITH_ME` view provides information about all shares that the current user can attach to (excluding their own created shares). ## Schema When you query the `MD_INFORMATION_SCHEMA.SHARED_WITH_ME` view, the query results contain one row for each share that the current user can discover. The `MD_INFORMATION_SCHEMA.SHARED_WITH_ME` view has the following schema: | Column Name | Data Type | Value | |-------------|-----------|-----------------------------------| | NAME | STRING | The name of the share | | URL | STRING | The share_url which can be used to attach the share | | CREATED_TS | TIMESTAMP | The share’s creation time | | UPDATE | STRING | The share’s update mode (MANUAL vs. AUTOMATIC) | | ACCESS | STRING | Whether anyone (referred to as UNRESTRICTED) or only organization members (referred to as ORGANIZATION) can attach to the share by its share_url | ## Example usage ```sql from MD_INFORMATION_SCHEMA.SHARED_WITH_ME; ``` | name | url | created_ts |update | access | |-----------------------------|----------------------------------------------------------------------------|------------------------|----------|-----------| | efs_ia_benchmark | md:_share/efs_ia_benchmark/11597119-359a-4e02-8e5c-bc2b9b8c1908 | 2024-07-16 15:09:11-04 | MANUAL | ORGANIZATION | | hf_load_test_share | md:_share/hf_load_test_share/f76062a5-f1f5-4024-987d-fc2eea48311b | 2024-07-29 09:07:33-04 | MANUAL | ORGANIZATION | | mdw | md:_share/mdw/87be4635-fbfd-4d4b-9cae-b629842733d5 | 2024-10-16 17:04:42-04 | AUTOMATIC | ORGANIZATION | | my_sample_share | md:_share/my_sample_share/c4ee2a30-2fb6-4cb5-b664-9030ae43ffdc | 2024-09-24 17:53:23-04 | MANUAL | ORGANIZATION | --- --- sidebar_position: 5 title: Object name resolution --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Object name resolution ## Fully qualfied naming convention Fully qualified names (FQN) in MotherDuck are of the form `..`. Fully qualified naming convention allows you to query objects in MotherDuck regardless of context. Queryable objects can be tables and views. For example: ```sql SELECT * FROM mydatabase.myschema.mytable; ``` Fully qualified naming convention is useful when you want your SQL to execute reliably across multiple interfaces, by various users, or in programmatic scripts. ## Relative naming convention For convenience, MotherDuck enables you to omit database or schema when querying objects. When **database is omitted**, MotherDuck will attempt to resolve the query by using the current database: ```sql SELECT * FROM myschema.mytable; ``` When **both database and schema are omitted**, MotherDuck will first attempt to find the object in the current schema. Thereafter, it will attempt to find the object in other schemas in the current database. If the object name is ambiguous - for example if multiple tables with the same name exist in the database - MotherDuck will return an error: ```sql SELECT * FROM mytable; ``` You may also choose to **omit just the schema**. MotherDuck will first search the current schema, and thereafter will search for the object across all other schemas in the specified database: ```sql SELECT * FROM mydatabase.mytable; ``` --- --- sidebar_position: 1 title: PRINT_MD_TOKEN pragma --- # PRINT_MD_TOKEN pragma You can retrieve your MotherDuck authentication token using the `PRINT_MD_TOKEN` pragma. In CLI or Python, to avoid having to re-authenticate every time, you can store your token as an environment variable; for example, by running `export motherduck_token='xxxx'` in the terminal. Be sure to replace 'xxxx' with your own token! # Syntax ```sql PRAGMA PRINT_MD_TOKEN; ``` --- --- sidebar_position: 1 title: REFRESH DATABASE --- # REFRESH DATABASE There are two types of databases that can be refreshed: **database shares** and databases attached to **read-scaling** connections. **Read-scaling** connections sync automatically every minute. To ensure maximum freshness, run `CREATE SNAPSHOT` on the writer, followed by `REFRESH DATABASE` on the reader. This pulls the latest snapshot. **Database shares** can also be refreshed—either automatically or manually. In this case, the writer uses `UPDATE SHARE` instead of `CREATE SNAPSHOT`, followed by `REFRESH DATABASE` on the reader. ```sql REFRESH DATABASES; -- Refreshes all connected databases and shares ┌─────────┬───────────────────┬──────────────────────────┬───────────┐ │ name │ type │ fully_qualified_name │ refreshed │ │ varchar │ varchar │ varchar │ boolean │ ├─────────┼───────────────────┼──────────────────────────┼───────────┤ │ │ motherduck │ md: │ false │ │ │ motherduck share │ md:_share// │ true │ └─────────┴───────────────────┴──────────────────────────┴───────────┘ REFRESH DATABASE my_db; -- Alternatively, refresh a specific database ┌─────────┬──────────────────┬──────────────────────────┬───────────┐ │ name │ type │ fully_qualified_name │ refreshed │ │ varchar │ varchar │ varchar │ boolean │ ├─────────┼──────────────────┼──────────────────────────┼───────────┤ │ │ motherduck share │ md:_share// │ false │ └─────────┴──────────────────┴──────────────────────────┴───────────┘ ``` Lean more about [CREATE SNAPSHOT](/sql-reference/motherduck-sql-reference/create-snapshot.md) and [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md). --- --- sidebar_position: 1 title: REVOKE READ ON SHARE --- # REVOKE READ ON SHARE For restricted shares, use the `REVOKE` command to explicitly remove share access from users that have an existing `GRANT`. After running a `REVOKE` command there may be a delay of a few minutes before access is fully removed if a user is currently querying the share. Only the owner of the share can use the `REVOKE` command to remove access from others. `GRANT` and `REVOKE` do not apply to `UNRESTRICTED` shares. ## Syntax ```sql REVOKE READ ON SHARE FROM [, , ]; ``` ## Example usage ```sql REVOKE READ ON SHARE birds FROM duck; # revokes access to the share 'birds' from the user with username 'duck' REVOKE READ ON SHARE taxis FROM usr1, usr2; # revokes access to the share 'taxis' from the users with usernames 'usr1' and 'usr2' --- --- sidebar_position: 1 title: SHOW ALL DATABASES --- # SHOW ALL DATABASES The `SHOW ALL DATABASES` statement shows all databases, would it be MotherDuck database, DuckDB database or MotherDuck shares. It returns: * `alias` (`db_name` or `share_alias`) * `is_attached` flag to mention if the database is attached or not. * `type` (e.g. DuckDB, MotherDuck, MotherDuck share) * `fully_qualified_name` (empty, md:_share/ or md:db_name) To query specific columns, you can use the table function `MD_ALL_DATABASES()`. # Syntax ```sql SHOW ALL DATABASES; ``` or using the table function ```sql select * from MD_ALL_DATABASES(); ``` # Example usage ```sql SHOW ALL DATABASES; ``` Example output: ```bash ┌──────────────────────────────────────────┬─────────────┬──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐ │ alias │ is_attached │ type │ fully_qualified_name │ │ varchar │ boolean │ varchar │ varchar │ ├──────────────────────────────────────────┼─────────────┼──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤ │ TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ true │ motherduck │ md:TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ │ TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ true │ motherduck │ md:TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ │ base │ false │ motherduck │ md:base │ │ base2 │ true │ motherduck │ md:base2 │ │ db1 │ false │ motherduck │ md:db1 │ │ integration_test_001 │ false │ motherduck │ md:integration_test_001 │ │ my_db │ true │ motherduck │ md:my_db │ │ my_share_1 │ true │ motherduck share │ md:_share/integration_test_001/18d6dbdb-e130-4cdf-97c4-60782ed5972b │ │ sample_data │ false │ motherduck │ md:sample_data │ │ source_db │ true │ motherduck │ md:source_db │ │ test_db_115 │ false │ motherduck │ md:test_db_115 │ │ test_db_28d │ false │ motherduck │ md:test_db_28d │ │ test_db_cc9 │ false │ motherduck │ md:test_db_cc9 │ │ test_share │ true │ motherduck share │ md:_share/source_db/b990b424-2f9a-477a-b216-680a22c3f43f │ │ test_share_002 │ true │ motherduck share │ md:_share/integration_test_001/06cc5500-e49a-4f62-9203-105e89a4b8ae │ ├──────────────────────────────────────────┴─────────────┴──────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┤ │ 15 rows (15 shown) 4 columns │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- --- sidebar_position: 1 title: TEMPORARY TABLES --- # TEMPORARY TABLES The `CREATE TEMPORARY TABLE` statement creates a new a temporary from a sql query. This command is used to create a local temporary table. [More information can be found in the DuckDB documentation.](https://duckdb.org/docs/sql/statements/create_table.html#temporary-tables) ## Syntax ```sql CREATE [ OR REPLACE ] TEMPORARY TABLE [ IF NOT EXISTS ]
AS ... ``` Temporary Tables can be created traditionally with column names and types, or with `Create Table ... As Select` (CTAS). ### Shorthand Convention The word `TEMP` can be used interchangably with `TEMPORARY`. ## Example Usage ```sql CREATE TEMPORARY TABLE flights AS FROM 'https://duckdb.org/data/flights.csv'; ``` This will create a local table with data from the duckdb `flights.csv` file. ## Notes - Temporary Tables in MotherDuck persist locally, not on the server. As such, local constraints should be considered when using them. - Because they are bound to your session, when your session ends, any temporary tables will no longer be available. --- --- sidebar_position: 1 title: UPDATE SHARE --- # UPDATE SHARE Shares can either be manually or automatically updated by the share creator. All users of the share will automatically see share updates within 1 minute, containing both DDL (like CREATE TABLE) and DML (inserts, updates, or deletes) changes. These updates are transactionally consistent snapshots, i.e. never partial database updates. The share creator can have the share be automatically updated when the underlying database changes. This is done by specifying the `UPDATE AUTOMATIC` option during [share creation](create-share.md). Alternatively the share creator can manually update the share with a new point-in-time snapshot of the database. This is done by running the `UPDATE SHARE` command. # Syntax ```sql UPDATE SHARE ; ``` --- --- id: motherduck-rest-api title: "MotherDuck REST API" description: "" sidebar_label: Introduction sidebar_position: 0 hide_title: true custom_edit_url: null --- import ApiLogo from "@theme/ApiLogo"; import Admonition from '@theme/Admonition'; import Heading from "@theme/Heading"; import SchemaTabs from "@theme/SchemaTabs"; import TabItem from "@theme/TabItem"; import Export from "@theme/ApiExplorer/Export"; import DocCardList from '@theme/DocCardList'; The REST API methods are in 'Preview' and may change in the future To better support scenarios that require some flexibility or dynamic configuration around managing a MotherDuck organization we are exposing an OpenAPI endpoint with some new functionality. At the moment it enables limited management of users and tokens via HTTP without requiring a DuckDB + MotherDuck client to be running. All of the methods are authenticated using the token of an ADMIN user from your MotherDuck Organization and passing it via the `Authorization` header with a value of `Bearer TOKEN`. If you would like to generate your own OpenAPI client the spec file is located at https://api.motherduck.com/docs/specs --- --- id: users-create-service-account title: "Create new user" description: "Create user is currently restricted to creating a user with a 'Member' role" sidebar_label: "Create new user" hide_title: true hide_table_of_contents: true api: eJzlmFFv2zYQx78KIQzoNrhx3CYvfnNmBTOQ2p1s76GOG9DSxWYjURpJOfEEfffdUVIkK163wR4QIC8JRR2Pd3/+KPOYOXECihsRy1Hg9J1Ug9LvfQXcwBTUVvgw8P04lcbpOAFoX4mEjNH0F2vEaAQTmvmpUiBNuGMKtFHCNxAwEzPrS8g144XpozAbbL/7BNEK1Dum4hDQt+Fr7fQXRQDOsuNoQI/C7LAzcwap2cRK/MmLuRfLHC0U/JHiVFdxgEaZfRQKMAujUug4fiwNBkSveJKEwreDu980ecgc7W8g4tQyuwQwn3j1DXxKM1GkiRGg6S0FJHkE1k8YTu5tQOUYSlSuaQw3Bu2w6+vi9jbJbvIl9kZC3oBcm43T7+ETf6qePlxe5p1/4+X2Nrhb/vyDQ/nuy1/FxaJUG7bChZAC9WBCMi5ZrNZcVnrleaehzqJOaZnnxTudxFIX+X44P6d/+5NNU98Hre/TkFXGzv+i8L4i/xT4xaFYr3jAvAKN08UYYfZ8fSDE9rLMNsBAqVixakjHgSceJSG0YsspuOA/ubT2e/4GwzvP/W3uTmfkT2idFvGWHrlSfIcjhIFIH5FleyEqw2XeDnaA+NGkLL5nRTjMbLhhj6CgYkesQmD3mI+pMmsmhZv7b2YrBcOtYIRp61lK9CMy8dN3JBQSJ46KfdGYNCvXYk/RTi1Ii6pK6IUF0ZLYe0niSG55KAKGX8oAGRQ81K+OyEMxnoLM+Xgwn/068UZf3OGbRPOAsDWivWMQ3ZO2yehh3g6w+vElq3PJy99YTOy1QboX3CnovJ54V6Ph0B2/STSbctZMfjyGyVrQJpAtqA6QePGSxHFs2DWeOF8fhnVkp2BwPJndXU/m47f5eXzWsgbw4hgAazWbADZZekHf5aHT40jSCZyHzC2ze10ItsI7BYej8cz1xoObu6nr/e56d67nTbw3yeS+uBWYl8cdKA/Lu/+z3UKuRWpOxvgZpfo8ie35EwtFrCKd7rbXLQpmqpfVllpUnaYqxLcbYxLd73Z5Is6iGNVRQeo/nPlxZMvJqsCeEsHFirXK7Od1I080hbXE5xVwBcquI+0Dry7E3Tr/uqx7XndMDdWyjkvJP9m4hhgX8/DIzQafRzgRJVLo3Ds7PzungZR4xG1QpdPy/kHCo71YaC9RVu/cU19VFKIYeDLdJORCUnxW8qxcmIWz7aHh813GhlYNe7NsxTXMVZjn1I2qKbrfwOaWK8FXJbA4AHiAKdFaPsCOMihyeT+juck8TO2ObH+R6FahGDHAij0x37VdNsD6PLEFz6q8TIkKeBV/xE7623ewEVttLSq2L3NCLtdpgXHhk5igH/0GPSUtnarRuD/hcteIEOWxFrP4ASRK1ClTMfSM4aLzvwAw4m74 sidebar_class_name: "post api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; Create user is currently restricted to creating a user with a 'Member' role --- --- id: users-create-token title: "Create an access token for a user" description: "Create an access token for a user" sidebar_label: "Create an access token for a user" hide_title: true hide_table_of_contents: true api: eJzlWN9z4jYQ/lc0ms70rsMFuCR94I0kZMpMCncO9KGEMsIWoIstu5KchDL+37sr2/gHbno38JCZvARbXq12v+9bRdodDSOumBGhHHq0R2PNlf7kKs4Mn4SPXNIWNWytaW9GDb5rOm9Rzd1YCbOF0R3tx2YTKvGPdQIj8wQsIqZYwA04szYCPsCY2YA7CR+ylexji3pcu0pE6XxYTq25Ifj9Z00yE+1ueMBob0eZ749X1qnZRuhIGyXkmuKaBhZEF3/NHh6i3V0yh9FAyDsu17B0rwtv7CV/+3x5mbS+x8vDg7eY//ITTeZJiyr+dywUB6yMijmmiiNcm6vQ22J8VYMWdUNpuDQ29CjyhWtxan/TmOyulFgWSLj8xl2DgSikxgiu7Vfjl4xkHCy5SrMTQRzQ3nmnY7PL3rqX5792cKiGLVJI+EskUs6JkATIDKWnaZJTc4jJ6ximXhfppMPJXGJEMwCGeYtnkA3yaV+0y3y0mWOYKxb7gFLZLKngPUvDmydJOq6jUOoUnM+dDv5Uc72PXZdrvYp9khvT09Fha6Mh22oMzIZAUmuIWnhNc1ahChjmHsdg8J88wAfLHF8w8/9LT0TASYlvrglMAx9pcXsLo3/IyTPTBMp8LSRU4JZkXmiScRlKf1vytwxDnzOb89HqqMvAZPsSYFXJphxJZdlMMhdNKrliHnHSCj6dOgIgna0b863hu+GEKxUqkk9BjlkQ+bwWGxIXej/k0tpX/PVvFs7g63RwP7Fi1DrmZREwpRhCByQE+ogs64Tlhrh/VoPtS2IXJeGKpOEQs2GGPHPF86oVS58TKBH4kmVWTmp2II8CSQsAbC5GmDqeGUQfQBMfX4FQyLQ2cby06C7jooJoqwCkpqoc6JkVolVi91CJQ/kEgvfINeQBGhTM129OkU0xnkKZ01F/Ovlt7Az/HNy8S2k2AFtItHuMRCvQljXarLcGrZ4fanUqWXbq496bE2kluFOo83bsXA1vbgajdynNMpyFJs+P0WQBaFmQNVE1KPHiUImj0JDbMJZvT4ZFZKfQ4Gg8WdyOp6P3uT3usSwEeHGMAAs0ywIsa+lAfZdNp8ehxLsi88kgy+5tSbAW3il0OBxNBs6of7e4Hzh/DJzFwHHGzrvUZBXcXJiXxx0om+Gt/tuuSa6m1ANUru0NiTAAp3QbtRgw22Wx7mHjxf5PFNoTq23V9Gj7qdu2HaH2Lm/XJO2sDYRdIPWUN3hi5YP9xphI99ptFomzIASElRe7j2duGNCk1Da6xypIWa81j/bco6e87WPvk5wpiBSzw1pyip7LoMDQ9kg6+w7TXtzlG2ilvwDgAR92akbq7zbqG4iaOHCoJ/0vQ3CAaaZgds86Zx2ciEAFzIacLfc9OFeY2Sdr+ItpRz4T9r5sodxlFMzoUxfmWRLgt1fqmhXtuA2SBqa73ZJpPlV+kuAwQKSwRQePT0wJtrQKhy1MaHwGuldw5uMHUe33MPrBySrhI2nsyjXGn5e/xOKHI2aMb/D4yLfltp/tpm2ADAAGo0o/X6drf5qgk2L6wV6Knbt0Rh/AjsyrtvOSwL+M7VVtmXXrgrTsFHvG9gH8taGGFgsrUDu2oz6T6zgtwNQnKhGPK+WeR6rRVv5Q6lFWwQCerIVtsAJXOTZZmwjr+F94PGm8 sidebar_class_name: "post api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; Create an access token for a user --- --- id: users-delete-token title: "Invalidate a user access token" description: "Invalidate a user access token" sidebar_label: "Invalidate a user access token" hide_title: true hide_table_of_contents: true api: eJzlmE1v4zYQhv8KQRTobuGN493k4pu3VlADqd1V7B42cQ1aGsfc6Ksk5SYV9N87Q0m2ZKldNM4hQHyxRJEzL18++hhmPE5ACSPjaOLzIU81KP3BhwAMzOMHiHiPG3Gv+fCWGzrXfNnjGrxUSfOErRkfpWYbK/m3DYItyxx7JEKJEGMobftIvIBtZovhIryAZzbaSvrYor0thIIPM26eErqmjZLRPc97XMGfqVSA0oxKIe91hiLR9rDHfdCekkkhBZWrezCMrv+oWdnlkE0EwWxj9TXzknyD2inEH7d3d0l2nS+xNZTRNUT3mHo4wDPxWJ19vLwkad+Pcnfnr5Y//cDzZXtuS2rRSRxp0KTu4/k5/TVndJN6Hmi9SQNWdcZEXhwZiIydU5IE0rNr0f+maUzW9jdefwPP8Bx/PX7Rleez8JmL+kCb58dHBxThZWQxoxCVi3toL/Txus23wECpWLFqSI/DowiTAI605STO/18hbf9GvNF45TpfFs7NnOJJrdNCbxlRKCWecIQ0EOoTZpk31vx235FYaIodRcwmZfGGFXKY2QrD/gIF1brLdQBsg/Mx1czqk8K78F+ylYYhbUaaYz9Li94hE+//w0IZYeKwuOFrSbNyLRqO9g6GHFFVGU1aSxIHbRIn0U4E0mc/4zyQQSkC/eqI7NL4EmQupqPF/JeZO/nqjN8kmh3GHhAdnIJow9o6o928dbD6qc3qIhLlyxD8VwdpQ9xL0Hk1cz9PxmNn+ibRrNt5YPLTKUweDK0DeQRVB4kXbRKnsWFXcRq9PgwPyl6CwelsvrqaLaZv8/G49/IA4MUpAB7crANYZ6lF32XX1+Mkou9eETCnnN3rQvBI3ktwOJnOHXc6ul7dOO7vjrtyXHfmvkkmm+ZWYF6e9kHZbW/ztX2E3BGpLVfK97wwwIStEJmwpRUzZdmL5es2ppq4qIaLeg6LPd7fDfq2Tu5nVeWZ94viuJ9VZW1OlSaoXVUBpyrAoVtjEj3s90Uiz8IYvVZ+6j2ceXHI81pdfUP3Q7H+R9X1ngKKVBWzdL4GoUBZKshM27NckV9tojEmYi5+kbPRbxMcScoKJwZn52fnxGoSaxMKm6WsrL9rUsPTvTgDj6afBAJrdQxrp56V7t3y3QDHWf/wf1ir3cv9BWzc7w2gJ1sURcOybC00LFSQ59SMRYSi/Qc83Aklxdpyig8iqekY122DX27QUrh/EvF3bsnze0bJu5RXt2xENyw6kdIZHj7AU30Hg2r/Z+Xt3J94hpK9iXZfYQvCRxTIjeLyCNcsMbWBrWcwwbfnfexcO3MHu9N3R21VS8R61UFt46SpC5fK9rAbSHneNIw05vk/O5pT7Q== sidebar_class_name: "delete api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; Invalidate a user access token --- --- id: users-delete title: "Delete a user" description: "Permanently delete a user and all of their data. THIS CANNOT BE UNDONE" sidebar_label: "Delete a user" hide_title: true hide_table_of_contents: true api: eJzlmE1z2kgQhv/K1JySKhZMYl+44SBXqHJEVsAelnK5BqkNE+srMyNiVqX/vt36QEJoU5vAwVU+IVDP9NvvPCOmlfIoBiWMjMKpx0c80aD0Hx74YID3uAfaVTKm23jzK6hAhBAaf8+KECYYjWAi9JjwfRY9MbMFqZgnjOizxefpnH0a2/ZswW4ttrQnM9vCaY3YaD5aFdn4Q49rcBMlzR5/TPk4MdtIyX9EkXb1kGFELJQIMKPSeYwkPbEwW5wtxBul9PyyLTtX+EOarQxJHYvURoTl9MxE7FCtdrcQCD5KudnHNKU2SoYbvBOIl3sIN5hu9OHm5jcTZD2u4HsiFaDTRiVAdSnQcRRq0JT2w9UVfRzPPk9cF7R+SnxWBaMiNwoNrgSFizj2pZtnG3zTNCY9LSVafwPX4MBY0YobWWQ8mHZSdHYkd1VHPmQZ3bvu0norPObgINDmchoDrF5sOiS212GB3oNSkWLVkB6HFxHEPrS0ZSTO+6Up8/ij+caTR8f6c2nNFzSf1Dop9JYzCqXEHkdIA4E+o8r2QlSBD1lb7DhkeVLahoUcxFEY9gMUVOzItQ/sCesxVWXNonCv/Ue20jAk1kjT9rO06B0y8f4nFsoQEwfFtm4kTcu1OHK0VxvSoqoyepWDmJM4PCVxGu6ELz32CetABqXw9asjskvjJchc2uPl4vPMmf5tTd4kmh3G1ogOz0H0yNomo928dbD68ZTVZSjKvzws7LVBeiTuEnTezZzb6WRi2W8SzaadNZMfz2GyNrQJZAuqDhKvT0m0I8PuoiR8fRjWyi7BIB5IH+9meB59kwwevKwBvD4HwNrNJoBNlk7ou+k6PU4RODxj+swqq3tdCLbkXYLDqb2wHHt8/zi3nL8s59FynJnzJpk8NrcC8+a8A2W3vcd/2y3kWqRmFIyPUeqND31i3neO+GA3HOQd7CCtmqOMukhQu6pNTZSPgVtjYj0aDEQs+0GEVikvcZ/7bhTwrNH8zgnnYvlaLfBhEWmmqlGl72sQClS+qORFHlka+iVPNMFEzMEDNRt/neJIUla4OOxf9a8ItTjSBtt6Glt20pNmb9+2P6135QVfCBTlGXgxg9gX2N2jsNy8tHR7xXdDDCzeGPT4qG5He3yLFVBEmq6FhqXys4x+xoZB0RsFvNwJJcU6ZxIfOlLTNS7pE57S4Cf1vXNKdt+z//0SobOUahOHtIXxoJjQN7x8hn3zzUVGu3ALwkPbSWlxe4z9f2waA0+ehUTRAdOJdW8tyFT6/2+wU7LSqy4oQacutDGPWETPEGbZQaah76Qxy/4F/Dk/SA== sidebar_class_name: "delete api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; Permanently delete a user and all of their data. THIS CANNOT BE UNDONE --- --- id: users-list-tokens title: "List a user's access tokens" description: "List a user's access tokens" sidebar_label: "List a user's access tokens" hide_title: true hide_table_of_contents: true api: eJzlmE1v2zgQhv8Kocu2gDeO2+Tim1sruwZSu1XsPawRGLQ0sdlIpEpSTryG/vsOKcn6bIvAPgTIKQo1Mxy+85AW5+CIGCTVTPBJ4AydRIFUf4ZM6bl4BK6cnqPpRjnDpaOzgfueo8BPJNN7HD04o0RvhWT/2Rg4cp+iRUwljUBjLGvD8AWO6S2G4/gin8g+9pwAlC9ZnPnbF+SJ6S3jRG+BCLmhvAiPc/tbiKgzPDh6H5tASkvGN/gmos+3wDc4yfDD9fWLwqY9R8KPhElADbRMwKxBgooFV6DMZB8uL82fesy7xPdBqYckJIUx5uELroFrY07jOGS+naP/XRmfQ3sBYv0dfI2OsTS10CybMZe7tKNS0j2aMQ2R+r0/C7pEehAyotrokaBBWpSjaYgv4DlGPVZUd4Wp6zBnERCbL8m8FEE3jOFLoBqClVYvCvJEFdaHbRinYbgneZSsSjRYCR7uK/HWQoRAbRGt+yobb88HPIkMyDbIEwJsqmX/UT4Njc19WiNhaUSsraKaQW069Gy4FtsltW+uuvj5RAPioQsofT5uIiSSbjoFaAiOuwCkFJIULqboNIpDaORmKimCF4W09rV4o/HKc78t3Lu5iceUSuAEun+6ymYZCsP7tJnsiBM7KREPJEsHDwaqyRNIKPYzW4dAcM/YI8OurLqoZYuXUkkrAJ4imummnrlE75CJ97+QkPFss2bn3nHSQ16LmqK9UpAGVYXQSwuiJXHQJnHCd7gDAvIZ14EMMhqqV0dkV47nIHMxHS3mf8+8yb/u+E2i2SFsiejgFERr0lYZ7eatg9WPbVYXnOafHBC8OkhryZ2DzpuZ92kyHrvTN4lmVc6SyY+nMFkKWgWyAVUHiVdtEqdCkxuR8NeHYZnZORiczuarm9li+jaPx6OWJYBXpwBYqlkFsMpSi77rrq/HCQKHV6iQuPnqXheCjfTOweFkOne96eh2ded6/7jeyvW8mfcmmayLW4B5fdoHZbe89Z/tBnINUluq3DL83KXEXL7/wEuhvS4TXXQWIsAj13QdNmBrYhoEQ6e/G/RtG6J/KJoEaf/ogyO7oq2QyBDtt1rHatjv05hdRAKllUHiP174InLSSrPizuCflbvRsjgW3UQqWgz2ZglUgrQQGO2sZV6AL3aiMU5EPPwAJ6OvE/Q0mWULH1xcXlwaNGOhdETtLHnn49ea1PQ7ZqbhWffjkDJ7zbXrPuR6LZ3dAP2sYvh3WGmslB2bLWZhTA+HNVWwkGGammG8JEjTxcHHHZWMri2HeNAwZZ6xNA/4ZQatrI4njfPOy3l9T37TuOlcSbFdudms+EmYmP/w8RH21R5RavbbFm/eWA2TX/Z6hNLFuuLYOvVM/Y+Q/eWae5L5ma82D7IS94oHE70zKVTOWti2GKpX5Gg1Ngmm6f8fcaOl sidebar_class_name: "get api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; List a user's access tokens --- --- id: ducklings-get-duckling-config-for-user title: "Get user instances" description: "Gets instance configuration for a user. Requires 'Admin' role." sidebar_label: "Get user instances" hide_title: true hide_table_of_contents: true api: eJzlmEtv2zgQx78KoUtbwOtHm1x8c2sla6Br7yr2HtYIDFqibSYSqSWpPCrou+8MJUUPCykKG4sAOZkKh8OZP39kyEkdGTNFDZdiFjhjJ0j8+5CLvf5tz8y0+PgmxY7vr6RaaaacnhMw7Sse4yAYcs2MJlxoQ4XPiG9tk9wl2UlFKElgWJ947N+EK6bJh0kQcfGBKBmyPrgzdK+d8dopfWjntudo5ieKm2foSJ1JYg5S8R80n3J9m4FFTBWNmGFKWxuOscTUHMCjgA74wnltsx0ydpBHbg5cEHNgRKo9FaV7mNs/sIg649QxzzE60kaBDNAT0afvTOxhkvHny8tfcpv1HJUrADoblTDMAeSIpdCQMkz2eTjEn6bPm8QHRfQuCUlpDHGAyoYJg+Y0jkPu2zkGdxrHpMcJyO0d8w0MjBWut+H5jIrRYPMIMrOf25ars9H8B+vShokkwmWMk9DGiOYBVQE075JoK53brKHBuuXy1vZCQNqnSN3/E1LP2YXSv2+7gIFbyzqQyiP0MrSrn7cvXs+k4fMo65rorYTBFI0vujj4SgO7g5g251v/CMii+07pmrMvgWamFOzmcgiI+0SjOGSt2DIMLvgll9a+4W8y3XjuXyv3Zon+uNZJHm/hkSpFn2EESBjpE7Jsr0xpiFQ0g50IYiclckfycGCDU0MemWLlvuTbkNkDz5SZ1ZNaH3FQKWkFgNPAcNPWs5DoIzDx6RUJuYCJo/L8epk0LdaioWivEqRFVSn02oJoSRwdkzgTD4BrQL5BHsAgp6F+c0R2xXgOMlfzyWr5+8Kb/eNO3yWaHcJWiI5OQbQhbZ3Rbt46WP1yzOpK0OLqwII3B2kjuHPQebXwvs6mU3f+LtGsy1kx+eUUJitB60C2oOog8eKYxLk05Eom4u1hWEV2Dgbni+XmarGav8/j8UXLCsCLUwCs1KwDWGfpiL7LrtvjDICDp1BI3CK7t4VgK7xzcDibL11vPvm+uXG9v11v43rewnuXTDbFLcG8PO1C2S1v8992C7kWqRkawzGKVYc9szrj433sDB5GA3xJ60FaPuCzQVUbwNKAeihf/YkKYcjBmFiPBwMa834kQTGFdYy+LyMnq9USbpDqfBVbFYWXtURPZQUAv7eMKngHYrQoibUsdP3DToQ1EuLBvZpM/pzBSIwsF3PUH/aHSFwstYmonaUoTFwzY4sipJ5WYynSaoeeocCS52bYkxnEIeW2GGGVSwvR187DCAyt7PA7rlVOGlWZA6SC1mm6pZqtVJhl+Gd4QCis1EDzgSpOt5ZROIS4xjYs8Q5ubeyVHD8WwQefyE+KM53JlFtZ4EaG62KCX9C8Z8/1OlCGe/EAT25YUowv7574PotNbeDRiYgQvcB67eIbCq8ANW4KTnplA713BgXKWYulvGcC1CtjNPiNAWbZf01Ev3s= sidebar_class_name: "get api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; Gets instance configuration for a user. Requires 'Admin' role. --- --- id: ducklings-set-duckling-config-for-user title: "Set user instances" description: "Sets instance configuration for a user. Requires 'Admin' role" sidebar_label: "Set user instances" hide_title: true hide_table_of_contents: true api: eJztmE1v2zgQhv8KwUtbwI2dNrn45iQO1kDX6Sr2HtYIDFqibSYSqSWpfFTQf+8MJUUfFhIUNooA6SWRzOFw5p2HlDQpVTHXzAolJwEd0iDx70IhN+az4faiuDlXci02l0rPDde0RwNufC1inARTrrk1REhjmfQ58Z1tkrska6UJIwlMOyIe/z8RmhvyYRREQn4gWoUcvFm2MXS4oKULQ2961HA/0cI+wUBKR4ndKi1+sHzFxU0GFjHTLOKWa+NsBIYSM7sFjxIG4A6XdZftiHGAPAi7FZLYLSdKb5gs3cPa/pZHjA5Tap9idGSsBhVgJGKP37jcwCLDL6env+Q261GdCwAyW51wzAF/4caeqeAJl2sa9Choabm0OMTiOBS+c9W/NbhcuhunWt1y30KcscaqWgFawmhektftNGfB8gFU56/blsVaGvGDd0nFZRJhVeMkNFgBNA+YDuDyNolWit5kDUkWLZc3bhQCMj5DBn9PSD26DpV/13YBE1eOfOBWROhl4GDIr09ezqThcyfrmuithHdMizrC7/mIiZU0ee5fBgP819qYiQ+7yayTkJTG9GBQ/YHlrcGCxiddHJyxwB2+cNIcrv4RkMU2ndI1V5/BSci1hgdBOQXEfWRRHPJWbBkGF/ySS2ff8De6WHrjf+bj6xn6E8YkebyFR6Y1e4IZIGFk9siyXZnSEKloBjuSxC1K1Jrk4cDDgVnywDUv96VYhdw9K22ZWT2pxQ4HlZJOAHiSWGHbehYSfQQmPr0goZCwcFQ++54XTYtaNBTtVYK0qCqFXjgQHYnHuyRO5D3gGpBzyAMYFCw0b47IrhgPQeZ8OprP/rryJv+NL94lmh3CVoge74NoQ9o6o928dbD6dZfVuWTFaycP3hykjeAOQefllXc2ubgYT98lmnU5Kya/7sNkJWgdyBZUHSSe7JI4VZZcqkS+PQyryA7B4PRqtry8mk/f5/H4rGUF4Mk+AFZq1gGss7RD32nX2+MEgIPP6JCMi+zeFoKt8A7B4WQ6G3vT0bfl9dj7d+wtx5535b1LJpvilmCe7vdC2S1v87HdQq5FaobGcIxiwypOnM7Y+BnS/v1xH7swpp+WzZ+sX/WVsK2k78uOUaJDmLK1NjbDfp/F4ihSoJjGFtiRryKa1fpQ10h1XsVWN+q5luip7B7h/YozDd+BGC3uDa/q9YzrmpS9meYHdev7tPgM7fjM7TZsfpoOnGRYFxduUdy/XbbY4yMevNyT0fcJzEN58ooeHw2OBrhkrIyNmEu16Kxdc+uaeqSubYOHtDom9m8Q5vpa/mj7cciEa6a56qVF4Rf0/hgMXenh/7DW+Wt0FbeQCVqn6YoZPtdhluHPUBiNnUa4vGdasJXbJ3AQCoPXgNka3hz5Cyl+LGIPPpFXmoudyZTHicTDBF5ZE7yDyzv+VO9jZngebAEAwArjy4fP8yg+z9BJNX3nbM565YyR7/PYvmh7U9ti3+f45bcqmpRRvos1e8AeBPx1kSonSt4Zwt9SGjK5SfL9nLtEBvHtp7Zlii3SKy8wqU4toGDOYqbuuISildJYvEddsuwnVCzhWg== sidebar_class_name: "put api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; Sets instance configuration for a user. Requires 'Admin' role --- --- title: SQL reference sidebar_class_name: sql-reference-icon description: SQL reference for MotherDuck & DuckDB --- import DocCardList from '@theme/DocCardList'; --- --- sidebar_position: 1 title: Error Messages --- ## Connection Errors ### Disallowed connections with a different configuration If you create different connections with the same connection database path (such as `md:my_db`) but a different configuration dictionary, you may encounter the following error: ```text Connection Error: Can't open a connection to same database file with a different configuration than existing connections ``` This validation error prevents accidental retrieval of a previously cached database connection, and can happen only in DuckDB APIs that make use of a [database instance cache](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#multiple-connections-and-the-database-instance-cache). In file-based DuckDB, this can only happen when the previous connection is still in scope. With MotherDuck, the database instance cache is longer lived, so you may see this error even after the previous connections have been closed. #### How To Recover For multiple connections that are used sequentially: * If the configuration does not need to differ, consider unifying it, which will allow the same underlying client-side database instance to be reused. * If the configuration differs intentionally, [set the database instance TTL to zero](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#setting-custom-database-instance-cache-time-ttl) and close the previous connections. For multiple connections whose lifecycles need to overlap, add a differentiating suffix to the connection string, so that these connections are no longer considered to be backed by the same database. A good differentiating string is the [`session_hint`](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/read-scaling.md#session-affinity-with-sessionhint). While it is meant to associate an individual end user to a dedicated backend when used with read scaling tokens, it can also be used to signal client-side intent for a distinct database instance when used with regular tokens. --- --- sidebar_position: 1 title: FAQ --- import Versions from '@site/src/components/Versions'; ### What's the difference between .open md: & ATTACH 'md:' ? `.open` initiates a new database connection (to a given database or `my_db` by default) and can be passed different parameters in the connection strings like `motherduck_token` or [saas_mode](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md#authentication-using-saas-mode) flag. If you have previous local database attached, it will be detached when using `.open`. `ATTACH` keeps the current database connection and attaches a new motherduck (cloud) database(s) to the current connection. You'll need to use `USE` to select the database you want to query. ### How do I know which version of DuckDB I should be running ? MotherDuck currently supports DuckDB and it is compatible with any client version through Check that you have a compatible version of DuckDB running locally. ### How do I know which version of DuckDB am I running? You can use the `VERSION` pragma to find out which version of DuckDB you are running ```sql PRAGMA VERSION; ``` ### How do I know what's executed locally and what's executed remote ? If you run an explain on your query, you will see the phyical plan. Each operation is followed by either (L)= Local or (R)= Remote as shown in the query plan example below. ```sql EXPLAIN [Your Query] ``` ![explain-sample](./img/explain_sample.png) :::note The explain output will resemble the regular DuckDB explain output, with two main differences: * Operations that run locally are marked as (L), and operations running remotely on the MotherDuck service are marked as (R). * The MotherDuck DuckDB extension adds four new type of custom operators, to exchange data between your local DuckDB and the MotherDuck service: * The `UploadSink` operator runs locally and sends data from your local DuckDB to the remote MotherDuck service. * The `UploadSource` operator runs remotely in the DuckDB on the MotherDuck side and consumes the uploaded data. * The `DownloadSink` operator runs remotely on the MotherDuck side and prepares the data to be downloaded by the local DuckDB. * The `DownloadSource` operator runs in your local DuckDB, fetching the data from the MotherDuck service made available via the remote DownloadSink. ::: ### I connect to both MotherDuck and a local database, why is there an uncheckpointed WAL left behind? DuckDB keeps a [database instance cache](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#multiple-connections-and-the-database-instance-cache) for each unique connection path. Connecting to MotherDuck extends the lifetime of the database instance to a default of 15 minutes. If you observe a WAL file left behind for the local database after the process exits or run into the "File is already open" error when closing and reopening the connection, there are several workarounds: * Run `CHECKPOINT "local-database-name"` in the application code. * Run `DETACH "local-database-name"` in the application code * Disable the cache lifetime extension by setting `motherduck_dbinstance_inactivity_ttl` setting to `0s` (see [Setting Custom Database Instance Cache TTL](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#setting-custom-database-instance-cache-time-ttl)). ### Which DuckDB extensions are supported by MotherDuck? * [Arrow](https://duckdb.org/docs/extensions/arrow) * [AutoComplete](https://duckdb.org/docs/extensions/autocomplete) * [AWS](https://duckdb.org/docs/extensions/aws.html) * [Azure](https://duckdb.org/docs/extensions/azure) * [Delta](https://duckdb.org/docs/extensions/delta) * [Excel number and date formatting](https://duckdb.org/docs/extensions/excel) * [Full-text search (fts)](https://duckdb.org/docs/extensions/full_text_search) * [Iceberg](https://duckdb.org/docs/extensions/iceberg) * [inet](https://duckdb.org/docs/extensions/inet) * [JSON](https://duckdb.org/docs/extensions/json) * [Parquet](https://duckdb.org/docs/extensions/parquet) * [Spatial](https://duckdb.org/docs/extensions/spatial) * [Substrait](https://duckdb.org/docs/extensions/substrait) * [Time Zones and collations (icu)](https://duckdb.org/docs/extensions/icu) * [TPC-DS data generation and queries (tpcds)](https://duckdb.org/docs/extensions/tpcds) * [TPC-H data generation and queries (tpch)](https://duckdb.org/docs/extensions/tpch) ### Why am I not in the same Organization as my team? If you sign up to MotherDuck directly, you will create your own Organization as a part of the sign up flow. To join your team's Organization, reach out to your team and request that they [invite you to their Organization](../key-tasks/managing-organizations/managing-organizations.mdx#inviting-users-to-your-organization). As an alternative, you may reach out to [MotherDuck support](./support.md) and we can search for other users within your domain. ### How do I use my team's shared databases? Some database shares are scoped at the `ORGANIZATION` level. To use those shares, you must be in the same Organization as the person who created the share. In addition, some shares are marked as 'DISCOVERABLE`. This allows members of the same Organization to easily find those shares through the UI. Follow the steps outlined in ["Why am I not in the same Organization as my team?"](#why-am-i-not-in-the-same-organization-as-my-team) to join your team! ### How do I delete my account? You can delete your account and all associated information by following these steps: 1. Navigate to your personal Settings and select "Members" from the left sidebar 2. Click the three dots (⋮) next to your name 3. Select "Delete" 4. Confirm the account deletion :::note If you are the only member of your Organization, deleting your account will also delete the Organization. ::: For additional assistance, please contact our [support team](./support.md). ### Why I am getting SSL errors when connecting to MotherDuck from a Docker image? If you see SSL errors when trying to connect to MotherDuck from a Docker image, this is likely because the image does not have updated CA certificates. If the container was working and suddenly stopped, it is likely that the certificates in the image have expired. Please refer to [Docker's documentation](https://docs.docker.com/engine/network/ca-certs/) for best practices on updating CA certificates in Docker images. Some common errors you might see indicating an issue with your CA certificates include: * `Could not get default pem root certs.` * `Failed to create security handshaker.` * `Update handshaker factory failed.` --- --- sidebar_position: 3 title: Reinstall MotherDuck extension --- The MotherDuck extension will be automatically loaded and downloaded as soon as you connect to MotherDuck. However, you can force a reinstallation by following these steps: ``` FORCE INSTALL motherduck; ``` Next to that make sure you are running the current supported [version of DuckDB](../faq#how-do-i-know-which-version-of-duckdb-i-should-be-running-). --- --- sidebar_position: 7 title: Support --- Have a question that isn't answered in our [FAQ](./faq.md)? Join the [MotherDuck Slack Community](https://slack.motherduck.com/) or contact us at [support@motherduck.com](mailto:support@motherduck.com?subject=Support+question). --- --- sidebar_position: 6 title: Troubleshooting Data Access Policy --- In order to help you with certain kinds of MotherDuck issues, it can be helpful for us to access your MotherDuck account. For example, if a specific query on a specific dataset is triggering a bug, it may be necessary for us to access the data and SQL query, and possibly re-run a specific query, in order to reproduce the issue and diagnose the problem. A MotherDuck employee may use our community Slack or email to request your permission to access your MotherDuck account while troubleshooting an issue. If you give us permission to access to your MotherDuck account for troubleshooting, here is what you need to know: - Our goal is to understand the issue and resolve the problem. We will make every effort to minimize the amount of time we spend accessing your account and the amount of data we access. We will only access the data we need to investigate and troubleshoot the specific issue. - Any access to your data will be strictly read-only. - A MotherDuck employee may pull in other MotherDuck employees during the debugging process. By agreeing to allow us to access your account for troubleshooting an issue, other MotherDuck employees who are asked to help investigate the issue may also access your account, subject to the same terms of this policy, without requesting additional authorization from you. - We will not share or disclose the data we access while troubleshooting the issue to any third party or non-MotherDuck employee. - We may make temporary copies of your data while debugging the issue. Any such copies will be permanently deleted once the issue is resolved. - We may use the data we access in your account to generate a redacted copy of the data to be used for creating a bug report or test. - The permission you have granted to access your account lapses once this specific issue is resolved. --- --- title: Troubleshooting sidebar_class_name: troubleshooting-icon description: Troubleshooting --- --- --- sidebar_position: 2 title: Uninstall MotherDuck extension --- ### How do I uninstall MotherDuck? * Remove `motherduck_*` from your environment variables (most likely only `motherduck_token`) [1] * Remove any `motherduck*.duckdb_extension` file located into `~/.duckdb` [2] [1] To view all your environment variables you may use: ```bash $ env | grep -i motherduck ``` To unset in the current session: ```bash $ unset motherduck_token ``` To unset the variable permanently, you may have to check your shell initialisation files (`~/.bashrc`, `~/.zshrc`, etc.) [2] Note those files are generally under `~/.duckdb/extensions//`. Eg. `~/.duckdb/extensions/v0.9.1/osx_arm64`. You may use this script: ```bash $ find ~/.duckdb -name 'motherduck*.duckdb_extension' -exec rm {} \; ``` --- --- sidebar_position: 4 title: Install certificate on Windows machines --- In some circumstances, you may face an error that reads like `Http response at 400 or 500 level, http status code: 0`. On Windows machine, this is usually due to [Let's Encrypt](https://letsencrypt.org/) certificate not being trusted. To fix this, please follow the steps below: * download this file https://letsencrypt.org/certs/isrgrootx1.der * open it (double click on the file) ![Certificate window](images/open-certificate.png) * click on "Install Certificate" and follow the instructions: ![Import certificate](images/certificate-import.png) Then you should be able to try again. If it still doesn't work, could you check if it was correctly installed by opening the certmgr (typing "`cert`" in the search box should show it) ![Manage user certificates](images/manage-user-certs.png) And then it should be under `Trusted Root Certification Authorities\Certificates`: ![Certificates manager](images/certmgr.png)