dlt
dlt is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets. It offers a lightweight interface for extracting data from REST APIs, SQL databases, cloud storage, Python data structures, and many more.
dlt is designed to be easy to use, flexible, and scalable:
- dlt infers schemas and data types, normalizes the data, and handles nested data structures.
- dlt supports a variety of popular destinations and has an interface to add custom destinations to create reverse ETL pipelines.
- dlt can be deployed anywhere Python runs, be it on Airflow, serverless functions, or any other cloud deployment of your choice.
- dlt automates pipeline maintenance with schema evolution and schema and data contracts.
Dlt integrates well with DuckDB (they also used it as a local cache) and therefore with MotherDuck.
You can check more about MotherDuck integration in the official documentation.