data sources
Back to DuckDB Data Engineering Glossary
Data sources are the origin points of information in a data pipeline or analytics workflow. They can include databases, APIs, files, or streaming platforms that provide raw or structured data for processing and analysis. Common examples include relational databases like PostgreSQL, cloud storage services like Amazon S3, SaaS application APIs like Salesforce, and streaming platforms such as Apache Kafka. In the context of DuckDB, data sources can be directly queried using SQL statements, often without the need for explicit data loading. For instance, you can query a CSV file stored on disk or in cloud storage using syntax like:
Copy code
SELECT * FROM 'path/to/file.csv';
Or query a Parquet file:
Copy code
SELECT * FROM 'data.parquet';
Understanding various data sources and how to connect to them is crucial for aspiring data professionals, as it forms the foundation for data integration and analysis workflows.