Introducing Flights: agent-native data pipelines in MotherDuckJoin the livestream

Skip to main content

AWS Glue

AWS Glue is a serverless data integration service for preparing and moving data with Spark jobs, crawlers, and the AWS Glue Data Catalog. AWS Glue jobs can connect to MotherDuck through the MotherDuck Postgres endpoint using Glue's PostgreSQL JDBC support.

How it works with MotherDuck

  1. Create a MotherDuck access token.
  2. Configure the AWS Glue job with a PostgreSQL JDBC connection to the MotherDuck Postgres endpoint.
  3. Use postgres as the user, the MotherDuck token as the password, and md: or a specific MotherDuck database as the database name.
  4. Use Glue's JDBC dbtable option for a table or view that the job should read.
  5. Make sure the Glue job's network configuration can reach the public MotherDuck endpoint.
connection_options = {
"url": "jdbc:postgresql://pg.us-east-1-aws.motherduck.com:5432/md:?sslmode=require",
"dbtable": "main.my_table",
"user": "postgres",
"password": "<motherduck_token>",
}

dyf = glueContext.create_dynamic_frame.from_options(
connection_type="postgresql",
connection_options=connection_options,
)

Use this route when a Glue job needs to read MotherDuck data as part of an AWS ETL workflow. For high-volume loading into MotherDuck, it is often simpler to write files to S3 from Glue and load those files from MotherDuck.