Teaching Your LLM About DuckDB the Right Way: How to Fix Outdated Documentation

2025/07/15 - 4 min read

BY

Most developers are still feeding their AI assistants stale, fragmented documentation. There's a better way.

For instance, if you ask "What's the latest DuckDB version your data has been trained on?" to ChatGPT, Claude, and Gemini, here's what they know:

AI AssistantDuckDB VersionTraining Data Cutoff
GPT-4o0.10.2May 2024
Gemini 2.5 Pro1.0.0June 2024
Claude Sonnet 41.1.3Late 2024

Projects like DuckDB (and MotherDuck) move incredibly fast. Even 3-month-old documentation can be completely outdated, making your workflow painful as you tweak code with methods that no longer exist. Version 0.10 compared to 1.3.2 (current) feels prehistoric.

So how do you ensure your AI gets the latest docs when you need them?

In this blog, we'll explore updating your LLMs through llms.txt or Cursor's docs feature—using DuckDB and MotherDuck as examples.

A new standard for AI: llms.txt

Traditional files like robots.txt and sitemap.xml help search engines understand your site structure — but they weren’t built with large language models (LLMs) in mind. That’s where llmstxt.org comes in. It's a growing standard tailored specifically for LLMs, offering content in a format that’s easier for AI to read and reason about.

As LLMs become a more common way developers and users access documentation, clarity and structure are more important than ever. Parsing raw HTML often leads to messy results: cluttered navigation, JavaScript, styling tags — all noise from the perspective of an AI model.

In fact, we may already be at the point where LLMs are consuming developer docs more than humans do. Andrej Karpathy even called this shift out in a recent post.

The llms.txt spec introduces two files:

  1. /llms.txt – a lightweight, structured index of your docs, similar in spirit to sitemap.xml, but more markdown-friendly.
  2. /llms-full.txt – a single, comprehensive text dump of all your documentation, ready for ingestion.

In addition, the specification recommends that websites offering content potentially useful to LLMs also provide a clean Markdown version of each page. This version should be accessible at the same URL as the original page, with .md appended.

By using these, documentation updates become much easier to manage, especially for tools that rely on LLMs to serve answers and insights.

Where to find llms.txt and llms-full.txt for DuckDB and MotherDuck ?

Typically, if you go to the root of the website mywebsite.com/llms.txt or sometimes at significant root like mywebsite.com/docs/llms.txt you should find them!

You can also try appending .md to any webpage URL to see if the site provides markdown versions.

For DuckDB, you'll find them at :

You can also append any page with .md and get the markdown version for instance : https://duckdb.org/docs/stable/clients/cpp.md

For MotherDuck, you'll find them at :

You can also append any docs page with .md to get the markdown version, but to make it even easier, we have a drop down menu with the llms.txt and also a Copy as Markdown on each of our page.

img1

Feeding your LLMs with Cursor docs

The llms.txt and markdown files we discussed work great when you copy and paste them into any LLM chatbox. However, if you're using Cursor, there's an even better, automated way to avoid copy-pasting every time.

In Cursor, under Settings > Cursor Settings > Features > Docs, you can add documentation sources to be used as context in your prompts. These sources are crawled and indexed. They can be documentation websites, API docs, or even raw GitHub code.

When you add a custom documentation URL, you give it a name (an alias for your prompts), and Cursor crawls and indexes it for you. Once these are added, you can reference them in your prompt using @docs <my alias name>.

im2

Now next time you want to ask something around DuckDB or MotherDuck, just use @ and select the documentation.

Going further with MCP

Keeping your AI assistants updated with fresh documentation doesn't have to be a manual chore. Whether you're using llms.txt files for quick copy-paste workflows or Cursor's automated docs feature for seamless integration, these approaches ensure your AI has access to the latest information when you need it most.

As more projects adopt the llms.txt standard and tools like MCP emerge, the gap between rapidly evolving codebases and AI knowledge will continue to shrink. Your future self (and your code) will thank you for making this investment in better AI-assisted development.

If you want your AI to actually run DuckDB/MotherDuck queries (not just understand the docs), MotherDuck has an official DuckDB MCP server that lets your AI execute queries directly against your data.

In the meantime, take care of your LLMs, and keep prompting.

CONTENT
  1. A new standard for AI: llms.txt
  2. Where to find llms.txt and llms-full.txt for DuckDB and MotherDuck ?
  3. Feeding your LLMs with Cursor docs
  4. Going further with MCP

Start using MotherDuck now!

blog subscription icon

Subscribe to motherduck blog

PREVIOUS POSTS

The Data Engineer Toolkit: Infrastructure, DevOps, and Beyond

2025/07/03 - Simon Späti

The Data Engineer Toolkit: Infrastructure, DevOps, and Beyond

A comprehensive guide to advanced data engineering tools covering everything from SQL engines and orchestration platforms to DevOps, data quality, AI workflows, and the soft skills needed to build production-grade data platforms.

This Month in the DuckDB Ecosystem: July 2025

2025/07/08 - Simon Späti

This Month in the DuckDB Ecosystem: July 2025

DuckDB Monthly #31: Kafka Integration, Browser-Based Analytics, and Lake Format Innovations