PROMPT
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
Prompt Function​
The prompt
function allows you to interact with Large Language Models (LLMs) directly from SQL. You can generate both free-form text and structured data outputs.
The function uses OpenAI's models: gpt-4o-mini
(default) or gpt-4o
. Both models support single- and multi-row inputs, enabling batch processing.
Consumption is measured in AI Units. One AI Unit equates to approximately:
- 4,000 prompt responses with
gpt-4o-mini
- 250 prompt responses with
gpt-4o
These estimates assume an input size of 1,000 characters and response size of 250 characters.
Syntax​
SELECT prompt('Write a poem about ducks'); -- returns a single cell table with the response
Parameters​
Parameter | Required | Description |
---|---|---|
prompt_text | Yes | The text input to send to the model |
model | No | Model type, either 'gpt-4o-mini' (default), or 'gpt-4o-2024-08-06' (alias: 'gpt-4o' ) |
temperature | No | Model temperature value between 0 and 1 , default: 0.1 . Lower values produce more deterministic outputs. |
struct | No | Output schema as struct, e.g. {summary: 'VARCHAR', persons: 'VARCHAR[]'} . Will result in STRUCT output. |
struct_descr | No | Descriptions for struct fields that will be added to the model's context, e.g. {summary: 'a 1 sentence summary of the text', persons: 'an array of all persons mentioned in the text'} |
json_schema | No | A JSON schema that adheres to OpenAI's structured output guide. Provides more flexibility than the struct/struct_descr parameters. Will result in JSON output. |
Return Types​
The prompt
function can return different data types depending on the parameters used:
- Without structure parameters: Returns
VARCHAR
- With
struct
parameter: Returns aSTRUCT
with the specified schema - With
json_schema
parameter: ReturnsJSON
Examples​
Basic Text Generation​
-- Call gpt-4o-mini (default) to generate text
SELECT prompt('Write a poem about ducks') AS response;
-- Call gpt-4o with higher temperature for more creative outputs
SELECT prompt('Write a poem about ducks', model:='gpt-4o', temperature:=1) AS response;
Structured Output with Struct​
-- Extract structured information from text using struct parameter
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review;
This returns a STRUCT
value that can be accessed with dot notation:
SELECT
zoo_review.summary,
zoo_review.favourite_animals,
zoo_review.star_rating
FROM (
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review
);
Structured Output with JSON Schema​
-- Extract structured information using JSON schema
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
json_schema := '{
"name": "zoo_visit_review",
"schema": {
"type": "object",
"properties": {
"summary": { "type": "string" },
"sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
"animals_seen": { "type": "array", "items": { "type": "string" } }
},
"required": ["summary", "sentiment", "animals_seen"],
"additionalProperties": false
},
"strict": true
}') AS json_review;
This returns a JSON
value that, if saved, can be accessed using JSON extraction functions:
SELECT
json_extract_string(json_review, '$.summary') AS summary,
json_extract_string(json_review, '$.sentiment') AS sentiment,
json_extract(json_review, '$.animals_seen') AS animals_seen
FROM (
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
json_schema := '{ ... }') AS json_review
);
Use Cases​
Text Generation​
Using the prompt function to write a poem about ducks:
--- Prompt LLM to write a poem about ducks
SELECT prompt('Write a poem about ducks') AS response;
response |
---|
'Beneath the whispering willow trees, Where ripples dance with wayward breeze, A symphony of quacks arise [...]' |
Summarization​
We use the prompt function to create a one-sentence summary of movie descriptions. The example is based on the sample movies dataset from MotherDuck's sample data database.
--- Create a new table with summaries for the first 100 overview texts
CREATE TABLE my_db.movies AS
SELECT title,
overview,
prompt('Summarize this movie description in one sentence: ' || overview) AS summary
FROM kaggle.movies
LIMIT 100;
If write access to the source table is available, the summary column can also be added in place:
--- Update the existing table to add new column for summaries
ALTER TABLE my_db.movies ADD COLUMN summary VARCHAR;
--- Populate the column with summaries
UPDATE my_db.movies
SET summary = prompt('Summarize this movie description in one sentence: ' || overview);
The movies table now contains a new column summary
with one-sentence summaries of the movies:
SELECT title, overview, summary
FROM my_db.movies;
title | overview | summary |
---|---|---|
Toy Story | Led by Woody, Andy's toys live happily [...] | In "Toy Story," Woody's jealousy of the new [...] |
Jumanji | When siblings Judy and Peter discover [...] | In this thrilling adventure, siblings Judy and [...] |
... | ... | ... |
Structured Data Extraction​
The prompt function can be used to extract structured data from text.
The example is based on the same sample movies dataset from MotherDuck's sample data database. This time we aim to extract structured metadata from the movie's overview description.
We are interested in the main characters mentioned in the descriptions, as well as the movie's genre and a rating of how much action the movie contains, given a scale of 1 (no action) to 5 (lot of action).
For this, we make use of the struct
and struct_descr
parameters, which will result in structured output.
--- Update the existing table to add new column for structured metadata
ALTER TABLE my_db.movies ADD COLUMN metadata STRUCT(main_characters VARCHAR[], genre VARCHAR, action INTEGER);
--- Populate the column with structured information
UPDATE my_db.movies
SET metadata = prompt(overview,
struct:={main_characters: 'VARCHAR[]', genre: 'VARCHAR', action: 'INTEGER'},
struct_descr:={
main_characters: 'an array of the main character names mentioned in the movie description',
genre: 'the primary genre of the movie based on the description',
action: 'rate on a scale from 1 (no action) to 5 (high action) how much action the movie contains'
});
The resulting metadata
field is a STRUCT
that can be accessed as follows:
SELECT title,
overview,
metadata.main_characters,
metadata.genre,
metadata.action
FROM my_db.movies;
title | overview | metadata.main_characters | metadata.genre | action |
---|---|---|---|---|
Toy Story | Led by Woody, Andy's toys live happily [...] | ['"Woody"', '"Buzz Lightyear"', '"Andy"', '"Mr. Potato Head"', '"Rex"'] | Animation, Adventure, Comedy | 3 |
Jumanji | When siblings Judy and Peter discover [...] | ['"Judy Shepherd"', '"Peter Shepherd"', '"Alan Parrish"'] | Adventure, Fantasy, Family | 4 |
... | ... | ... | ... | ... |
Batch Processing​
The prompt
function can process multiple rows in a single query:
--- Process multiple rows at once
SELECT
title,
prompt('Write a tagline for this movie: ' || overview) AS tagline
FROM kaggle.movies
LIMIT 10;
Retrieval-Augmented Generation (RAG)​
The prompt
function can be combined with similarity search on embeddings to build a RAG pipeline:
-- Create a reusable macro for question answering
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
SELECT question_text AS question, prompt(
'User asks the following question:\n' || question_text || '\n\n' ||
'Here is some additional information:\n' ||
STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' ||
'Please answer the question based only on the additional information provided.',
model := 'gpt-4o'
) AS response
FROM (
SELECT title, overview
FROM kaggle.movies
ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC
LIMIT 3
)
);
-- Use the macro to answer questions
SELECT question, response
FROM ask_question('Can you recommend some good sci-fi movies about AI?');
This will result in the following output:
question | response |
---|---|
Can you recommend some good sci-fi movies about AI? | Based on the information provided, here are some sci-fi movies about AI that you might enjoy: [...] |
When passing free-text arguments from external sources to the prompt function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection.
Using prepared statements in Python:
# First register the macro
con.execute("""
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
-- Macro definition as above
);
""")
# Then use prepared statements for user input
user_query = "Can you recommend some good sci-fi movies about AI?"
result = con.execute("""
SELECT response FROM ask_question(?)
""", [user_query]).fetchone()
print(result[0])
Error Handling​
When usage limits have been reached or an unexpected error occurs while computing prompt responses,
the function will not fail the entire query but will return NULL
values for the affected rows.
To check if all responses were computed successfully, check if any values in the resulting column are null.
-- Check for NULL values in response column
SELECT count(*)
FROM my_db.movies
WHERE response IS NULL AND overview IS NOT NULL;
Missing values can be filled in with a separate query:
-- Fill in missing prompt responses
UPDATE my_db.movies
SET response = prompt('Summarize this movie description in one sentence: ' || overview)
WHERE response IS NULL AND overview IS NOT NULL;
Performance Considerations​
- Batch Processing: When processing multiple rows, consider using
LIMIT
to control the number of API calls. - Model Selection: Use
gpt-4o-mini
for faster, less expensive responses when high accuracy isn't critical. - Caching: Results are not cached between queries, so consider storing results in tables for repeated use.
Notes​
These capabilities are provided by MotherDuck's integration with OpenAI. Inputs to the prompt function will be processed by OpenAI.
For availability and usage limits, see MotherDuck's Pricing Model. If higher usage limits are needed, please reach out directly to the Slack support channel or email support@motherduck.com.