# dlt (data load tool)
> dlt is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets. It offers a lightweight interface for extracting data from REST APIs, SQL databases, cloud storage, Python data structures, and many more.
dlt is designed to be easy to use, flexible, and scalable:

* dlt infers schemas and data types, normalizes the data, and handles nested data structures.
* dlt supports a variety of popular destinations and has an interface to add custom destinations to create reverse ETL pipelines.
* dlt can be deployed anywhere Python runs, be it on Airflow, serverless functions, or any other cloud deployment of your choice.
* dlt automates pipeline maintenance with schema evolution and schema and data contracts.

Dlt integrates well with DuckDB (they also used it as a local [cache](https://dlthub.com/blog/dltplus-project-cache-in-early-access)) and therefore with MotherDuck.

You can check more about MotherDuck integration in the [official documentation](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck).

## Authentication

To authenticate with MotherDuck, you have two options:

1. **Environment variable:** export your `motherduck_token` as an environment variable:

```bash
export motherduck_token="your_motherduck_token"
```

2.	For Local development: add the token to `.dlt/secrets.toml`:

```toml
[destination.motherduck.credentials]
password = "my_motherduck_token"
```

## Minimal example

Below is a minimal example of using dlt to load data from a REST API (with fake data) into a DuckDB (MotherDuck) database:

```python
import dlt
from typing import Dict, Iterator, List, Sequence
import random
from datetime import datetime
from dlt.sources import DltResource

@dlt.source(name="dummy_github")
def dummy_source(repos: List[str] = None) -> Sequence[DltResource]:
    """
    A minimal DLT source that generates dummy GitHub-like data.

    Args:
        repos (List[str]): A list of dummy repository names.

    Returns:
        Sequence[DltResource]: A sequence of resources with dummy data.
    """
    if repos is None:
        repos = ["dummy/repo1", "dummy/repo2"]

    return (
        dummy_repo_info(repos),
        dummy_languages(repos),
    )

@dlt.resource(write_disposition="replace")
def dummy_repo_info(repos: List[str]) -> Iterator[Dict]:
    """
    Generates dummy repository information.

    Args:
        repos (List[str]): List of repository names.

    Yields:
        Iterator[Dict]: An iterator over dummy repository data.
    """
    for repo in repos:
        owner, name = repo.split("/")
        yield {
            "id": random.randint(10000, 99999),
            "name": name,
            "full_name": repo,
            "owner": {"login": owner},
            "description": f"This is a dummy repository for {repo}",
            "created_at": datetime.now().isoformat(),
            "updated_at": datetime.now().isoformat(),
            "stargazers_count": random.randint(0, 1000),
            "forks_count": random.randint(0, 500),
        }

@dlt.resource(write_disposition="replace")
def dummy_languages(repos: List[str]) -> Iterator[Dict]:
    """
    Generates dummy language data for repositories in an unpivoted format.

    Args:
        repos (List[str]): List of repository names.

    Yields:
        Iterator[Dict]: An iterator over dummy language data.
    """
    languages = ["Python", "JavaScript", "TypeScript", "C++", "Rust", "Go"]

    for repo in repos:
        # Generate 2-4 random languages for each repo
        num_languages = random.randint(2, 4)
        selected_languages = random.sample(languages, num_languages)

        for language in selected_languages:
            yield {
                "repo": repo,
                "language": language,
                "bytes": random.randint(1000, 100000),
                "check_time": datetime.now().isoformat(),
            }

def run_minimal_example():
    """
    Runs a minimal example pipeline that loads dummy GitHub data to MotherDuck.
    """
    # Define some dummy repositories
    repos = ["example/repo1", "example/repo2", "example/repo3"]

    # Configure the pipeline
    pipeline = dlt.pipeline(
        pipeline_name="minimal_github_pipeline",
        destination='motherduck',
        dataset_name="minimal_example",
    )

    # Create the data source
    data = dummy_source(repos)

    # Run the pipeline with all resources
    info = pipeline.run(data)
    print(info)

    # Show what was loaded
    print("\nLoaded data:")
    print(f"- {len(repos)} repositories")
    print(f"- Languages for {len(repos)} repositories")

if __name__ == "__main__":
    run_minimal_example()
```

dlt revolves around three core concepts:
* Sources: Define where the data comes from.
* Resources: Represent structured units of data within a source.
* Pipelines: Manage the data loading process.

In the example above:
* dummy_source defines a source that simulates GitHub-like data.
* dummy_repo_info and dummy_languages are resources producing repository and language data.
* A pipeline loads this data into MotherDuck.

The core integration with MotherDuck is defined in the pipeline configuration:

```python
pipeline = dlt.pipeline(
    pipeline_name="minimal_github_pipeline",
    destination="motherduck",
    dataset_name="minimal_example",
)
```

Setting destination="motherduck" tells dlt to load the data into MotherDuck.


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `POST https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

Payload:

```json
{
  "page_path": "/integrations/ingestion/dlt/",
  "page_title": "dlt (data load tool)",
  "text": "<the user's feedback, max 2000 characters>",
  "source": "<optional identifier for your interface, for example 'claude.ai' or 'chatgpt'>"
}
```

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed payloads, and `429` when rate-limited.