Skip to main content

PROMPT

💡Preview Feature

This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.

Prompt Function​

Large Language Models (LLMs) can be prompted, using the prompt function. Outputs can be either text or structured data.

The prompt function uses OpenAI's gpt-4o-mini or gpt-4o. Both models support constant and single-row inputs. Multi-row (batch) processing is only permitted with gpt-4o-mini.

Consumption is measured in compute units (CU). One CU hour equates to approx. 1k prompt responses with gpt-4o-mini or 50 prompt responses with gpt-4o, assuming an input size of 1000 characters and response size of 250 characters.

Syntax​

SELECT prompt('Write a poem about ducks'); -- returns a single cell table with the response

Optional parameters​

ParameterDescription
modelModel type, either 'gpt-4o-mini' (default), or 'gpt-4o-2024-08-06' (alias: 'gpt-4o')
temperatureModel temperature value between 0 and 1, default: 0.1
structOutput schema as struct, e.g. {summary: 'VARCHAR', persons: 'VARCHAR[]'}. Will result in STRUCT output.
struct_descrDescriptions for struct fields that will be added to the model's context, e.g. {summary: 'a 1 sentence summary of the text', persons: 'an array of all persons mentioned in the text'}
json_schemaA json schema that adheres to this guide. Provides more flexibility than the struct/struct_descr parameters. Will result in JSON output.

Examples:

-- Call gpt-4o with temperature 1
SELECT prompt('Write a poem about ducks', model:='gpt-4o', temperature:=1);
-- returns a VARCHAR value
-- Call gpt-4o-mini with structured output
SELECT prompt('My zoo visit was amazing, my favourite animals were [...]',
struct:={summary: 'VARCHAR', favourite_animals:='VARCHAR[]', star_rating:='INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'});
-- returns a STRUCT value
-- Call gpt-4o-mini with json-based structured output
SELECT prompt('My zoo visit was amazing, my favourite animals were [...]',
json_schema := '{
"name": "zoo_visit_review",
"schema": {
"type": "object",
"properties": {
"summary": { "type": "string" },
"sentiment": { "type": "string", "enum": ["pos", "neg", "neutral"] },
},
"required": ["summary", "sentiment"],
"additionalProperties": false
},
"strict": true
}') FROM random_strings LIMIT 10;
SELECT prompt('text');
-- returns a JSON value

Text Generation​

Using the prompt function to write a poem about ducks.

--- prompts LLM to write a poem about ducks
SELECT prompt('Write a poem about ducks') as response
response
'Beneath the whispering willow trees, Where ripples dance with wayward breeze, A symphony of quacks arise [...]'

Summarization​

We use the prompt function to create a one-sentence summary of movie descriptions. The example is based on the sample movies dataset from MotherDuck's sample data database.

--- create a new table with summaries for the first 100 overview texts
CREATE TABLE my_db.movies AS
SELECT title,
overview,
prompt('summarize review in one sentence' || overview) as summary,
FROM kaggle.movies LIMIT 100

If write access to the source table is available, the summary column can also be added in place.

--- update the existing table add new column for summaries
ALTER TABLE my_db.movies ADD COLUMN summary VARCHAR;
--- populate the column with summaries
UPDATE my_db.movies SET summary = prompt('summarize review in one sentence' || overview);

The movies table now contains a new column summary with one-sentence summaries of the movies.

SELECT * FROM my_db.movies;
titleoverviewsummary
Toy StoryLed by Woody, Andy's toys live happily [...]In "Toy Story," Woody's jealousy of the new [...]
JumanjiWhen siblings Judy and Peter discover [...]In this thrilling adventure, siblings Judy and [...]
.........

Structured Data Extraction​

The prompt function can be used to extract structured data from text.

The example is again based on the same sample movies dataset from MotherDuck's sample data database. This time we aim to extract structured metadata from the movie's overview description.

We are interested in the main characters mentioned in the descriptions, as well as the movie's genre and a rating of how much action the movie contains, given a scale of 1 (no action) to 5 (lot of action). For this, we make use of the struct and struct_descr parameters, which will result in structured output.

--- update the existing table add new column for structured metadata
ALTER TABLE my_db.movies ADD COLUMN metadata STRUCT(main_characters VARCHAR[], genre VARCHAR, action INTEGER);
--- populate the column with structured information about characters, genre, and degree of action,
--- for which we provide a more detailed description using the struct_decr argument.
UPDATE my_db.movies SET metadata =
prompt(overview,
struct:={main_characters: 'VARCHAR[]', genre: 'VARCHAR', action: 'INTEGER'},
struct_descr:={action: 'rate on a scale from 1 (no) to 5 (a lot) how much action the movie contains'})

The resulting metadata field is of type STRUCT and can be accessed as follows:

SELECT title, 
overview,
metadata.main_characters,
metadata.genre,
metadata.action
FROM my_db.movies;
titleoverviewmetadata.main_charactersmetadata.genreaction
Toy StoryLed by Woody, Andy's toys live happily [...]['"Woody"', '"Buzz Lightyear"', '"Andy"', '"Mr. Potato Head"', '"Rex"']Animation, Adventure, Comedy3
JumanjiWhen siblings Judy and Peter discover [...]['"Judy Shepherd"', '"Peter Shepherd"', '"Alan Parrish"']Adventure, Fantasy, Family4
...............

Retrieval-Augmented Generation (RAG)​

The prompt function can be combined with similarity search on embeddings to build a RAG pipeline:

In this example, we build an ask_question macro that takes an input question, computes an embedding of it, and retrieves the 3 most similar movies. The retrieved movie titles and descriptions are then added, together with the user's question, to the prompt function's input. Finally, we return the input question and the prompt function's response to the user.

CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
SELECT question_text as question, prompt(
'User asks the following question:\n' || question_text || '\n\n' ||
'Here is some additional information:\n' ||
STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' ||
'Please answer the question, given the additional information.',
model = 'gpt-4o'
) AS response
FROM (
SELECT title, overview
FROM kaggle.movies
ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC
LIMIT 3
)
);

SELECT question, response FROM ask_question('Can you recommend some good sci-fi movies about AI?')

This will result in the following output.

questionresponse
Can you recommend some good sci-fi movies about AI?Certainly! Here are some great sci-fi movies about AI that you might enjoy: [...]
warning

When passing free-text arguments from external sources to the prompt function (e.g. user question in a RAG application), always use prepared statements.

Doing a prompt call with a prepared statement in Python:

con.execute("""[... register macros ... ]""")

user_query = "Can you recommend some good sci-fi movies about AI?"
con.execute("""
SELECT response FROM ask_question(?)
""", [user_query])

Error handling​

When usage limits have been reached while the query is running or an unexpected error occurs while computing the prompt responses, the prompt function does not fail but will return null values for the affected rows.

To check if all responses were computed successfully, check if any values in the resulting column are null.

-- check for null values in response column
SELECT count(*) FROM my_db.movies WHERE response IS NULL AND "text" IS NOT NULL;

Missing values can be filled in like this:

-- fill in missing prompt responses values
UPDATE my_db.movies SET response = prompt("text") WHERE response IS NULL AND "text" IS NOT NULL;

Notes​

These capabilities are provided by MotherDuck's integration with OpenAI and inputs to the prompt function will be processed by OpenAI.

For availability and usage limits, see MotherDuck's Pricing Model. If higher usage limits are needed, please reach out directly to Slack support channel or support@motherduck.com.