PROMPT
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
Prompt Function​
The prompt function allows you to interact with Large Language Models (LLMs) directly from SQL. You can generate both free-form text and structured data outputs.
The function supports OpenAI's gpt-5 series (gpt-5, gpt-5-mini, gpt-5-nano), gpt-4o-mini (default), gpt-4o, and the gpt-4.1 series. All models support single- and multi-row inputs, enabling batch processing.
Consumption is measured in AI Units. When reasoning over table rows, one AI Unit equates to approximately:
- 480 rows responses with
gpt-4o - 8,000 rows responses with
gpt-4o-mini - 600 rows responses with
gpt-4.1 - 3,000 rows responses with
gpt-4.1-mini - 12,000 rows responses with
gpt-4.1-nano - 720 rows responses with
gpt-5 - 3,600 rows with
gpt-5-mini - 18,000 rows with
gpt-5-nano
These estimates assume an input size of 1,000 characters and response size of 250 characters.
Syntax​
SELECT prompt('Write a poem about ducks'); -- returns a single cell table with the response
Parameters​
| Parameter | Required | Description |
|---|---|---|
prompt_text | Yes | The text input to send to the model |
model | No | Model type: 'gpt-5', 'gpt-5-mini', 'gpt-5-nano', 'gpt-4o-mini' (default), 'gpt-4o', 'gpt-4.1', 'gpt-4.1-mini', or 'gpt-4.1-nano' |
temperature | No | Model temperature value between 0 and 1, default: 0.1. Lower values produce more deterministic outputs. Not supported with GPT-5 models (use reasoning_effort instead). |
reasoning_effort | No | Controls reasoning depth for GPT-5 models only. Valid values: 'minimal' (default), 'low', 'medium', 'high'. Higher effort may improve accuracy for complex tasks. Only available for GPT-5 series models. |
return_type | No | Specifies the exact SQL type to return (e.g., 'INTEGER', 'BOOLEAN', 'DATE', 'VARCHAR[]', 'STRUCT(name VARCHAR, age INTEGER)'). Supports most DuckDB types including primitives, arrays, structs, and enums. Mutually exclusive with struct and json_schema. |
struct | No | Output schema as struct, e.g. {summary: 'VARCHAR', persons: 'VARCHAR[]'}. Will result in STRUCT output. Mutually exclusive with return_type and json_schema. |
struct_descr | No | Descriptions for struct fields that will be added to the model's context, e.g. {summary: 'a 1 sentence summary of the text', persons: 'an array of all persons mentioned in the text'} |
json_schema | No | A JSON schema that adheres to OpenAI's structured output guide. Provides more flexibility than the struct/struct_descr parameters. Will result in JSON output. Mutually exclusive with return_type and struct. |
Note: The return_type and struct parameters support enum types for classification tasks. Define enum types first using CREATE TYPE, then reference them in the struct schema (e.g., sentiment: 'sentiment_enum' or categories: 'category_enum[]' for arrays).
Return Types​
The prompt function can return different data types depending on the parameters used:
- Without structure parameters: Returns
VARCHAR - With
return_typeparameter: Returns the exact SQL type specified (e.g.,INTEGER,BOOLEAN,DATE,VARCHAR[],STRUCT(...)) - With
structparameter: Returns aSTRUCTwith the specified schema - With
json_schemaparameter: ReturnsJSON
Note: The return_type, struct, and json_schema parameters are mutually exclusive - only one can be used at a time.
Example Usage​
Basic Text Generation​
-- Call gpt-4o-mini (default) to generate text
SELECT prompt('Write a poem about ducks') AS response;
-- Call gpt-4o with higher temperature for more creative outputs
SELECT prompt('Write a poem about ducks', model:='gpt-4o', temperature:=1) AS response;
Structured Output with Struct​
-- Extract structured information from text using struct parameter
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review;
This returns a STRUCT value that can be accessed with dot notation:
SELECT
zoo_review.summary,
zoo_review.favourite_animals,
zoo_review.star_rating
FROM (
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review
);
Structured Output with JSON Schema​
-- Extract structured information using JSON schema
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
json_schema := '{
"name": "zoo_visit_review",
"schema": {
"type": "object",
"properties": {
"summary": { "type": "string" },
"sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
"animals_seen": { "type": "array", "items": { "type": "string" } }
},
"required": ["summary", "sentiment", "animals_seen"],
"additionalProperties": false
},
"strict": true
}') AS json_review;
This returns a JSON value that, if saved, can be accessed using JSON extraction functions:
SELECT
json_extract_string(json_review, '$.summary') AS summary,
json_extract_string(json_review, '$.sentiment') AS sentiment,
json_extract(json_review, '$.animals_seen') AS animals_seen
FROM (
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
json_schema := '{ ... }') AS json_review
);
Typed Output with Return Type​
The return_type parameter allows you to specify the exact SQL type for the model's response, providing strong typing for single-value extractions:
-- Extract an integer from text
SELECT prompt('The answer is 42', return_type := 'INTEGER') AS answer;
-- Returns: 42 (as INTEGER type)
-- Extract a boolean
SELECT prompt('Is the sky blue?', return_type := 'BOOLEAN') AS is_blue;
-- Returns: true (as BOOLEAN type)
-- Extract a date
SELECT prompt('When is January 15, 2025?', return_type := 'DATE') AS event_date;
-- Returns: 2025-01-15 (as DATE type)
-- Extract multiple structured fields
SELECT prompt(
'John is 30 years old and lives in NYC',
return_type := 'STRUCT(name VARCHAR, age INTEGER, city VARCHAR)'
) AS person_info;
-- Returns: {'name': 'John', 'age': 30, 'city': 'NYC'} (as STRUCT type)
-- Extract arrays
SELECT prompt('List the days of the week', return_type := 'VARCHAR[]') AS weekdays;
-- Returns: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
The return_type parameter supports most DuckDB types including:
- Primitives:
VARCHAR,INTEGER,BIGINT,DOUBLE,BOOLEAN,DATE,TIMESTAMP, etc. - Arrays:
INTEGER[],VARCHAR[],DOUBLE[], etc. - Structs:
STRUCT(field1 TYPE1, field2 TYPE2, ...) - Enums: Custom enum types created with
CREATE TYPE
GPT-5 Reasoning Effort​
The reasoning_effort parameter controls how much computational effort GPT-5 models spend on reasoning. This is only available for GPT-5 series models (gpt-5, gpt-5-mini, gpt-5-nano):
-- Use minimal reasoning (fastest, default)
SELECT prompt('What is 2+2?', 'gpt-5-mini',
reasoning_effort := 'minimal',
return_type := 'INTEGER') AS result;
-- Use low reasoning for simple tasks
SELECT prompt('Count the letters in "hello"', 'gpt-5-nano',
reasoning_effort := 'low',
return_type := 'INTEGER') AS letter_count;
-- Use medium reasoning for moderate complexity
SELECT prompt('Calculate 5 factorial', 'gpt-5-mini',
reasoning_effort := 'medium',
return_type := 'INTEGER') AS factorial;
-- Use high reasoning for complex tasks
SELECT prompt('Solve this logic puzzle: ...', 'gpt-5',
reasoning_effort := 'high') AS solution;
Note: The reasoning_effort parameter cannot be used with non-GPT-5 models, and temperature cannot be used with GPT-5 models. They are mutually exclusive ways of controlling model behavior.
Use Cases​
Text Generation​
Using the prompt function to write a poem about ducks:
--- Prompt LLM to write a poem about ducks
SELECT prompt('Write a poem about ducks') AS response;
| response |
|---|
| 'Beneath the whispering willow trees, Where ripples dance with wayward breeze, A symphony of quacks arise [...]' |
Summarization​
We use the prompt function to create a one-sentence summary of movie descriptions. The example is based on the sample movies dataset from MotherDuck's sample data database.
--- Create a new table with summaries for the first 100 overview texts
CREATE TABLE my_db.movies AS
SELECT title,
overview,
prompt('Summarize this movie description in one sentence: ' || overview) AS summary
FROM kaggle.movies
LIMIT 100;
If write access to the source table is available, the summary column can also be added in place:
--- Update the existing table to add new column for summaries
ALTER TABLE my_db.movies ADD COLUMN summary VARCHAR;
--- Populate the column with summaries
UPDATE my_db.movies
SET summary = prompt('Summarize this movie description in one sentence: ' || overview);
The movies table now contains a new column summary with one-sentence summaries of the movies:
SELECT title, overview, summary
FROM my_db.movies;
| title | overview | summary |
|---|---|---|
| Toy Story | Led by Woody, Andy's toys live happily [...] | In "Toy Story," Woody's jealousy of the new [...] |
| Jumanji | When siblings Judy and Peter discover [...] | In this thrilling adventure, siblings Judy and [...] |
| ... | ... | ... |
Structured Data Extraction​
The prompt function can be used to extract structured data from text.
The example is based on the same sample movies dataset from MotherDuck's sample data database. This time we aim to extract structured metadata from the movie's overview description.
We are interested in the main characters mentioned in the descriptions, as well as the movie's genre and a rating of how much action the movie contains, given a scale of 1 (no action) to 5 (lot of action).
For this, we make use of the struct and struct_descr parameters, which will result in structured output.
--- Update the existing table to add new column for structured metadata
ALTER TABLE my_db.movies ADD COLUMN metadata STRUCT(main_characters VARCHAR[], genre VARCHAR, action INTEGER);
--- Populate the column with structured information
UPDATE my_db.movies
SET metadata = prompt(
overview,
struct:={main_characters: 'VARCHAR[]', genre: 'VARCHAR', action: 'INTEGER'},
struct_descr:={
main_characters: 'an array of the main character names mentioned in the movie description',
genre: 'the primary genre of the movie based on the description',
action: 'rate on a scale from 1 (no action) to 5 (high action) how much action the movie contains'
}
);
The resulting metadata field is a STRUCT that can be accessed as follows:
SELECT title,
overview,
metadata.main_characters,
metadata.genre,
metadata.action
FROM my_db.movies;
| title | overview | metadata.main_characters | metadata.genre | action |
|---|---|---|---|---|
| Toy Story | Led by Woody, Andy's toys live happily [...] | ['"Woody"', '"Buzz Lightyear"', '"Andy"', '"Mr. Potato Head"', '"Rex"'] | Animation, Adventure, Comedy | 3 |
| Jumanji | When siblings Judy and Peter discover [...] | ['"Judy Shepherd"', '"Peter Shepherd"', '"Alan Parrish"'] | Adventure, Fantasy, Family | 4 |
| ... | ... | ... | ... | ... |
Classification with Enums​
The prompt function supports enum types for classification tasks, ensuring consistent and constrained outputs. This is particularly useful for sentiment analysis, categorization, and other classification scenarios.
Sentiment Analysis​
-- Define an enum for sentiment classification
CREATE TYPE sentiment_type AS ENUM ('positive', 'negative', 'neutral');
-- Classify customer reviews
SELECT
review_text,
prompt(
'Classify the sentiment of this review: ' || review_text,
struct := {sentiment: 'sentiment_type'}
).sentiment AS sentiment
FROM (
VALUES
('The product is amazing, I love it!'),
('Terrible quality, waste of money.'),
('It works fine, nothing special.')
) AS reviews(review_text);
This returns:
| review_text | sentiment |
|---|---|
| The product is amazing, I love it! | positive |
| Terrible quality, waste of money. | negative |
| It works fine, nothing special. | neutral |
Extracting Multiple Categories​
Use enum arrays to extract multiple instances of the same category from text:
-- Define enums for different types of skills mentioned in text
CREATE TYPE skill_type AS ENUM ('sql', 'python', 'javascript', 'react', 'aws', 'docker', 'git');
CREATE TYPE topic_type AS ENUM ('database', 'frontend', 'backend', 'devops', 'analytics', 'security');
-- Extract skills and topics from job descriptions
SELECT
description,
prompt(
'Extract the technical skills and topics mentioned in this text: ' || description,
struct := {
skills: 'skill_type[]',
topics: 'topic_type[]'
}
) AS extracted
FROM (
VALUES
('Looking for a developer with Python and SQL experience for database analytics work'),
('Frontend role using React and JavaScript, plus Git for version control'),
('DevOps engineer needed for AWS and Docker deployment automation')
) AS jobs(description);
This returns arrays of enum values:
| description | extracted.skills | extracted.topics |
|---|---|---|
| Looking for a developer with Python and SQL experience for database analytics work | ['python', 'sql'] | ['database', 'analytics'] |
| Frontend role using React and JavaScript, plus Git for version control | ['javascript', 'react', 'git'] | ['frontend'] |
| DevOps engineer needed for AWS and Docker deployment automation | ['aws', 'docker'] | ['devops'] |
Retrieval-Augmented Generation (RAG)​
The prompt function can be combined with similarity search on embeddings to build a RAG pipeline. For advanced retrieval strategies including hybrid search, reranking, and HyDE, see the Text Search guide.
-- Create a reusable macro for question answering
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
SELECT question_text AS question, prompt(
'User asks the following question:\n' || question_text || '\n\n' ||
'Here is some additional information:\n' ||
STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' ||
'Please answer the question based only on the additional information provided.',
model := 'gpt-4o'
) AS response
FROM (
SELECT title, overview
FROM kaggle.movies
ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC
LIMIT 3
)
);
-- Use the macro to answer questions
SELECT question, response
FROM ask_question('Can you recommend some good sci-fi movies about AI?');
This will result in the following output:
| question | response |
|---|---|
| Can you recommend some good sci-fi movies about AI? | Based on the information provided, here are some sci-fi movies about AI that you might enjoy: [...] |
When passing free-text arguments from external sources to the prompt function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection.
Using prepared statements in Python:
# First register the macro
con.execute("""
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
-- Macro definition as above
);
""")
# Then use prepared statements for user input
user_query = "Can you recommend some good sci-fi movies about AI?"
result = con.execute("""
SELECT response FROM ask_question(?)
""", [user_query]).fetchone()
print(result[0])
Batch Processing​
The prompt function can process multiple rows in a single query:
--- Process multiple rows at once
SELECT
title,
prompt('Write a tagline for this movie: ' || overview) AS tagline
FROM kaggle.movies
LIMIT 10;
Error Handling​
When usage limits have been reached or an unexpected error occurs while computing prompt responses,
the function will not fail the entire query but will return NULL values for the affected rows.
To check if all responses were computed successfully, check if any values in the resulting column are null.
-- Check for NULL values in response column
SELECT count(*)
FROM my_db.movies
WHERE response IS NULL AND overview IS NOT NULL;
Missing values can be filled in with a separate query:
-- Fill in missing prompt responses
UPDATE my_db.movies
SET response = prompt('Summarize this movie description in one sentence: ' || overview)
WHERE response IS NULL AND overview IS NOT NULL;
Performance Considerations​
- Batch Processing: When processing multiple rows, consider using
LIMITto control the number of API calls. - Model Selection: Use
gpt-4o-minifor faster, less expensive responses when high accuracy isn't critical. - Caching: Results are not cached between queries, so consider storing results in tables for repeated use.
Notes​
These capabilities are provided by MotherDuck's integration with Azure OpenAI. Inputs to the prompt function will be processed by Azure OpenAI.
For availability and usage limits, see MotherDuck's Pricing Model.
Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume embedding workloads and is always open to working alongside customers to support any type of workload and model requirements.
If you need higher usage limits or have specific requirements, please see our support page.
Regional Processing​
Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources.
| Function | Global | Europe |
|---|---|---|
PROMPT | ✓ | ✓ |