MotherDuck Now Speaks Postgres! Our pg_endpoint is now live!Demo - April 21

Skip to main content

Tag workloads with custom user agents

Connecting to MotherDuck with custom_user_agent parameter will tag queries to identify which workload issued them. That workload can represent an integration, pipeline, tenant, or internal service.

Those tags appear in MD_INFORMATION_SCHEMA.QUERY_HISTORY, so organization admins can inspect tagged activity, group it by workload, and use that breakdown in internal reporting.

note

MD_INFORMATION_SCHEMA.QUERY_HISTORY is a preview feature available on Business plans and only for organization admins.

1. choose a tagging convention

Use the custom_user_agent format described in Choose a custom_user_agent format.

You can use this pattern even if you are only tagging existing workloads for reporting or allocation. You do not need to build a full customer-facing integration.

Recommended format:

  • integration/version(metadata1,metadata2) with optional version and metadata
  • Avoid spaces in the integration and version parts
  • If you want to group by a single workload label later, keep that label in the first metadata position

Examples:

  • catalogsync
  • catalogsync/5.1.5.1
  • catalogsync/5.1.5.1(batchload,teamfinance)
  • customerportal/5.1.5.1(tenant42,eucentral1)

2. understand what QUERY_HISTORY stores

QUERY_HISTORY.USER_AGENT stores the full DuckDB user agent, not only your custom tag. When custom_user_agent is set, the value looks like this:

duckdb/<version>(<platform>) <api> <custom_user_agent>

Representative values:

QUERY_HISTORY.USER_AGENTExtracted custom_tagExtracted integration_nameExtracted metadata
duckdb/v1.5.1(osx_arm64) capi catalogsync/5.1.5.1(batchload,teamfinance)catalogsync/5.1.5.1(batchload,teamfinance)catalogsyncbatchload,teamfinance
duckdb/v1.5.1(wasm_eh) motherduck-wasm customerportal/5.1.5.1(tenant42,eucentral1)customerportal/5.1.5.1(tenant42,eucentral1)customerportaltenant42,eucentral1
duckdb/v1.5.1(linux_amd64) cppNULL

3. set custom_user_agent

Example in Python:

con = duckdb.connect("md:analytics", config={
"motherduck_token": token,
"custom_user_agent": "catalogsync/5.1.5.1(batchload,teamfinance)"
})

For other languages and frameworks, see the language and framework examples for setting custom_user_agent.

4. inspect recent tagged queries

Use this query to inspect recent QUERY_HISTORY rows and verify that your tags are being extracted the way you expect:

with tagged_queries as (
select
start_time,
user_name,
instance_type,
user_agent,
regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) as custom_tag
from MD_INFORMATION_SCHEMA.QUERY_HISTORY
where regexp_matches(user_agent, '^(?:[^ ]+ ){2}.+$')
order by start_time desc
limit 20
),
parsed as (
select
start_time,
user_name,
instance_type,
user_agent,
custom_tag,
regexp_extract(custom_tag, '^([^/( ]+)', 1) as integration_name,
nullif(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), '') as metadata
from tagged_queries
)
select
start_time,
user_name,
instance_type,
user_agent,
custom_tag,
integration_name,
metadata
from parsed
order by start_time desc

The extraction logic is:

  • regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) strips the built-in DuckDB and API tokens and returns your custom tag
  • regexp_extract(custom_tag, '^([^/( ]+)', 1) extracts the integration name
  • regexp_extract(custom_tag, '\\(([^)]*)\\)', 1) extracts the metadata payload inside parentheses

5. group tagged activity by workload

This example groups tagged queries by integration, the first metadata value, and duckling size over the last 7 days.

with tagged_queries as (
select
start_time,
end_time,
instance_type,
regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) as custom_tag
from MD_INFORMATION_SCHEMA.QUERY_HISTORY
where start_time >= now() - interval 7 day
and regexp_matches(user_agent, '^(?:[^ ]+ ){2}.+$')
),
parsed as (
select
coalesce(nullif(regexp_extract(custom_tag, '^([^/( ]+)', 1), ''), custom_tag) as integration_name,
nullif(split_part(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), ',', 1), '') as workload_name,
instance_type,
date_diff('second', start_time, end_time) as elapsed_seconds
from tagged_queries
)
select
integration_name,
coalesce(workload_name, 'unlabeled') as workload_name,
instance_type,
count(*) as queries,
sum(elapsed_seconds) as total_elapsed_seconds,
avg(elapsed_seconds) as avg_elapsed_seconds
from parsed
group by all
order by total_elapsed_seconds desc

If you want to group by the full metadata string instead, replace the workload_name expression with:

nullif(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), '') as workload_name

6. use tagged activity for internal allocation

Some teams use tagged query history as an input to internal chargeback or cost allocation. One approach is to calculate each workload's share of tracked query time and apply that share to a monthly invoice outside of MotherDuck.

with tagged_queries as (
select
start_time,
end_time,
regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) as custom_tag
from MD_INFORMATION_SCHEMA.QUERY_HISTORY
where start_time >= date_trunc('month', now())
and regexp_matches(user_agent, '^(?:[^ ]+ ){2}.+$')
),
workload_usage as (
select
coalesce(
nullif(split_part(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), ',', 1), ''),
regexp_extract(custom_tag, '^([^/( ]+)', 1)
) as workload_name,
sum(date_diff('second', start_time, end_time)) as elapsed_seconds
from tagged_queries
group by 1
),
totals as (
select sum(elapsed_seconds) as total_elapsed_seconds
from workload_usage
)
select
workload_name,
elapsed_seconds,
elapsed_seconds::double / nullif(total_elapsed_seconds, 0) as tracked_usage_share
from workload_usage, totals
order by tracked_usage_share desc

This is an internal accounting convention, not a MotherDuck billing feature. For the billing model itself, including Pulse compared to fixed-size ducklings and cooldown behavior, see Understanding the pricing model and Duckling sizes.