Replicating SQL Server tables to MotherDuck

This page will serve to show basic patterns for using Python to connect to SQL Server, read data into a dataframe, connect to MotherDuck, and then writing the data from the dataframe into MotherDuck. For more complex replication scenarios, please take a look at our ingestion partners.

To skip the documentation and look at the entire script, expand the element below:

Python script

import pyodbc

# Define your connection parameters
server = 'ip_address'
database = 'master'         # or use your database name
username = 'your_username'
password = 'your_password'  # consider using a secret manager or .env
port = 1433  # default SQL Server port

# Define the connection string for ODBC Driver 17
connection_string = (
    f"DRIVER={{ODBC Driver 17 for SQL Server}};"
    f"SERVER={server},{port};"
    f"DATABASE={database};"
    f"UID={username};"
    f"PWD={password};"
)

# Connect to SQL Server
try:
    connection = pyodbc.connect(connection_string)
    print("Connection successful.")
except pyodbc.Error as e:
    print(f"Error: {e}")
finally:
    connection.close()

import pandas as pd

try:
    connection = pyodbc.connect(connection_string)
    query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials"

    # Execute the query using pyodbc
    cursor = connection.cursor()
    cursor.execute(query)

    # Fetch the column names and data
    columns = [column[0] for column in cursor.description]
    data = cursor.fetchall()

    # Convert the data into a DataFrame
    df = pd.DataFrame.from_records(data, columns=columns)
finally:
    connection.close()

import duckdb

motherduck_token = 'your_token'

# Attach using the MOTHERDUCK_TOKEN
duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'")

# Create or replace table in the attached database
duckdb.sql(
    """
    CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS
    SELECT * FROM df
    """
)

SQL Server Authentication

SQL Server supports multiple methods of authentication - for the purpose of this example, we will use username/password authentication and pyodbc, along with ODBC Driver 17 for SQL Server. It should be noted that 'ODBC Driver 18 for SQL Server' is also available and includes support for some newer SQL Server features, but for the sake of compatibility, this example will use 17.

Consider the following authentication example:

import pyodbc

# Define your connection parameters
server = 'ip_address'
database = 'master'         # or use your database name
username = 'your_username'
password = 'your_password'  # consider using a secret manager or .env
port = 1433  # default SQL Server port

# Define the connection string for ODBC Driver 17
connection_string = (
    f"DRIVER={{ODBC Driver 17 for SQL Server}};"
    f"SERVER={server},{port};"
    f"DATABASE={database};"
    f"UID={username};"
    f"PWD={password};"
)

# Connect to SQL Server
try:
    connection = pyodbc.connect(connection_string)
    print("Connection successful.")
except pyodbc.Error as e:
    print(f"Error: {e}")
finally:
    connection.close()

This will set your credentials, and then attempt to connect to your server with pyodbc.connect, and return an error if it fails.

Reading a SQL Server table into a dataframe

Once you have authenticated, you can define arbitrary queries and then execute them with pd.read_sql, using the query and connection objects. For the purpose of this example, we are using SQL Server 2022 along with the AdventureWorks OLTP database.

note

While pandas is a great library, it is not particularly well-suited for very large tables. To learn more about using buffers and alternative libraries, check out this link.

import pandas as pd

try:
    connection = pyodbc.connect(connection_string)
    query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials"

    # Execute the query using pyodbc
    cursor = connection.cursor()
    cursor.execute(query)

    # Fetch the column names and data
    columns = [column[0] for column in cursor.description]
    data = cursor.fetchall()

    # Convert the data into a DataFrame
    df = pd.DataFrame.from_records(data, columns=columns)
finally:
    connection.close()

Inserting the table into MotherDuck

Now that the data has been loaded into a dataframe object, we can connect to MotherDuck and insert the table.

note

You will need to generate a token in your MotherDuck account. For production use cases, make sure to use a secret manager and never commit your token to your codebase.

import duckdb

motherduck_token = 'your_token'

# Attach using the MOTHERDUCK_TOKEN
duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'")

# Create or replace table in the attached database
duckdb.sql(
    """
    CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS
    SELECT * FROM df
    """
)

This will create the table, or replace it for the table already exists.

Handling More Complex Workflows

Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our ingestion partners, which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

Diagram

SQL Server Authentication​

Reading a SQL Server table into a dataframe​

Inserting the table into MotherDuck​

Handling More Complex Workflows​

SQL Server Authentication

Reading a SQL Server table into a dataframe

Inserting the table into MotherDuck

Handling More Complex Workflows