PROMPT_SQL

Overview

The prompt_sql function allows you to generate SQL queries using natural language. Simply describe what you want to analyze in plain English, and MotherDuck AI will translate your request into a valid SQL query based on your database schema and content.

This function helps users who are less familiar with SQL syntax to generate queries and experienced SQL users save time when working with unfamiliar schemas.

Syntax

CALL prompt_sql('<natural language question>'[, include_tables=<table_specification>]);

Parameters

Parameter	Type	Description	Required
`natural language question`	STRING	Your query in plain English describing the data you want to analyze	Yes
`include_tables`	ARRAY or MAP	Specifies which tables and columns to consider for query generation. When not provided, all tables in the current database will be considered.	No

Include tables parameter

You can specify which tables and columns should be considered during SQL generation using the include_tables parameter. This is particularly useful when:

You want to focus on specific tables in a large database
You want to improve performance by reducing the schema analysis scope

The parameter accepts three formats:

Array of table names: include all columns from specified tables:
```
include_tables=['table1', 'table2']
```

Map of tables to columns: include only specific columns from tables:

include_tables={'table1': ['column1', 'column2'], 'table2': ['column3']}

Map with column regex patterns: include columns matching patterns:
```
include_tables={'table1': ['column_prefix.*', 'exact_column']}
```

Examples

Basic example

Let's start with a simple example using MotherDuck's sample Hacker News dataset:

CALL prompt_sql('what are the top domains being shared on hacker_news?');

Output:

query
SELECT regexp_extract(url, 'https?://([^/]+)') AS domain, COUNT(*) AS count FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY count DESC;

Intermediate example

This example demonstrates how to generate a more complex query with filtering, aggregation, and time-based analysis:

CALL prompt_sql('Show me the average score of stories posted by each author who has posted at least 5 stories in 2022, sorted by average score');

Output:

query
SELECT 'by', AVG(score) AS average_score FROM hn.hacker_news WHERE EXTRACT(YEAR FROM 'timestamp') = 2022 GROUP BY 'by' HAVING COUNT(id) >= 5 ORDER BY average_score;

Advanced Example: Multi-table Analysis with Specific Columns

This example shows how to generate a query that focuses on specific columns:

CALL prompt_sql(
  'Find the top 10 users who submitted the most stories with the highest average scores in 2023',
  include_tables={
    'hn.hacker_news': ['id', 'by', 'score', 'timestamp', 'type', 'title']
  }
);

Output:

query
SELECT "by", AVG(score) AS avg_score, COUNT(*) AS story_count FROM hn.hacker_news WHERE "type" = 'story' AND EXTRACT(YEAR FROM "timestamp") = 2023 GROUP BY "by" ORDER BY story_count DESC, avg_score DESC LIMIT 10;

Expert example

This example demonstrates generating a complex query with subqueries, window functions, and complex logic:

CALL prompt_sql('For each month in 2022, show me the top 3 users who posted stories with the highest scores, and how their average score compares to the previous month');

Output:

query
WITH monthly_scores AS ( SELECT "by" AS user, DATE_TRUNC('month', "timestamp") AS month, AVG(score) AS avg_score FROM hn.hacker_news WHERE "type" = 'story' AND DATE_PART('year', "timestamp") = 2022 GROUP BY user, month ), ...

Failure example

This example shows that for some complex queries, the model might not generate a valid SQL query. Therefore the output will be the following error message:

CALL prompt_sql('Identify the most discussed technology topics in Hacker News stories from the past year based on title keywords, and show which days of the week have the highest engagement for each topic');

Output:

query
Invalid Input Error: The AI could not generate valid SQL. Try re-running the command or rephrasing your question.

To generate a valid SQL query, you can try to break down the question into simpler parts.

Best practices

Be specific in your questions: the more specific your natural language query, the more accurate the generated SQL will be.
Start simple and iterate: begin with basic queries and gradually add complexity as needed.
Use the include_tables parameter: when working with large databases, specify relevant tables to improve performance and accuracy.
Review generated SQL: always review the generated SQL before executing it, especially for complex queries.
Understand your schema: knowing your table structure helps you phrase questions that align with available data.
Use domain-specific terminology: include field names in your questions when possible.
Provide context in your questions: mention time periods, specific metrics, or business context to get more relevant results.

Notes

By default, all tables in the current database are considered. Use the include_tables parameter to narrow the scope.
To target a specific database, first execute the USE <database_name> command (learn more about switching databases).
The quality of generated SQL depends on the clarity of your natural language question and the quality of your database schema (table and column names).
This feature is powered by MotherDuck's integration with OpenAI's language models.

Troubleshooting

If you encounter issues with the prompt_sql function, consider the following troubleshooting steps:

Check your database schema: ensure that the tables and columns you're querying are present in the current database.
Be specific in your questions: the more specific your natural language query, the more accurate the generated SQL will be.
Use the include_tables parameter: when working with large databases, specify relevant tables to improve performance and accuracy.

Overview​

Syntax​

Parameters​

Include tables parameter​

Examples​

Basic example​

Intermediate example​

Advanced Example: Multi-table Analysis with Specific Columns​

Expert example​

Failure example​

Best practices​

Notes​

Troubleshooting​

Overview

Syntax

Parameters

Include tables parameter

Examples

Basic example

Intermediate example

Advanced Example: Multi-table Analysis with Specific Columns

Expert example

Failure example

Best practices

Notes

Troubleshooting