Multithreading and parallelism with JDBC
Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create connections in multiple threads. For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use connection pooling.
Connections in multiple threads
If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency.
Concurrency is supported by DuckDB, across multiple threads, as described in the Concurrency documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on Parallelism in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial.
First, let's create a class MultithreadingExample
and get the MotherDuck token from your environment variables.
package com.example;
import org.duckdb.DuckDBConnection;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
/**
* Examples for multithreading and connection pooling
*/
public class MultithreadingExample {
private static final String token = System.getenv("motherduck_token");
private final static Logger logger = LoggerFactory.getLogger(MultithreadingExample.class);
To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the .duplicate()
method to run a query:
private static void runQueryFromThread(String label, Connection conn, String query) {
try (Connection dupConn = ((DuckDBConnection) conn).duplicate();
Statement st = dupConn.createStatement();
ResultSet rs = st.executeQuery(query)) {
if (rs.next()) {
logger.info("{}: found at least one row", label);
} else {
logger.info("{}: no rows found", label);
}
} catch (SQLException e) {
throw new RuntimeException("can't run query", e);
}
}
You can then use a thread pool executor to run the queries using the runQueryFromThread
method:
public static void main(String[] args) throws SQLException, InterruptedException {
// Check that a motherduck_token exists
if (token == null) {
throw new IllegalArgumentException(
"Please provide `motherduck_token` environment variable");
}
// Add MotherDuck token to config
Properties config = new Properties();
config.setProperty("motherduck_token", token);
// Create list of queries to run in multiple threads
List<String> queries = new ArrayList<>();
queries.add("SELECT 42;");
queries.add("SELECT 'Hello World!';");
int num_queries = queries.size();
// Create thread pool executor and run queries
ExecutorService executor = Executors.newFixedThreadPool(num_queries);
try (Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:my_db", config);) {
for (int i = 0; i < num_queries; i++) {
String label = "query " + i;
String query = queries.get(i);
executor.submit(() -> runQueryFromThread(label, mdConn, query));
}
executor.shutdown();
boolean success = executor.awaitTermination(30, TimeUnit.SECONDS);
}
if (success) {
logger.info("successfully ran {} queries in threads", num_queries);
}
}
}
Connection pooling
If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts.
For connection pools, we recommend using HikariCP. Below is an example implementation. For this implementation, you can connect to a user account by providing a motherduck_token
in your database path.
The goal of this implementation is to distribute operations across multiple databases in a round-robin fashion. This HikariMultiPoolManager
class manages multiple HikariDataSource
s (connection pools) which each connect to a different connection url, and rotates between them when getConnection()
is called. You can specify a pool size which is applied to all HikariDataSource
s.
package com.example;
import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;
import org.duckdb.DuckDBConnection;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Example DuckDB connection pool implementation
*/
public class HikariMultiPoolManager implements AutoCloseable {
private static final String token = System.getenv("motherduck_token");
private final List<HikariDataSource> dataSources;
private final AtomicInteger index;
private final static Logger logger = LoggerFactory.getLogger(HikariMultiPoolManager.class);
public HikariMultiPoolManager(List<String> urls, int maximumPoolSize) {
// Create Hikari datasources from urls
this.dataSources = new ArrayList<>();
for (String url : urls) {
HikariDataSource ds = new HikariDataSource();
ds.setMaximumPoolSize(maximumPoolSize);
ds.setJdbcUrl(url);
dataSources.add(ds);
}
this.index = new AtomicInteger(0);
}
public Connection getConnection() throws SQLException {
int ind = index.getAndIncrement() % dataSources.size();
HikariDataSource ds = dataSources.get(ind);
return ds.getConnection();
}
public void evict() throws Exception {
for (HikariDataSource ds : dataSources) {
HikariPoolMXBean poolBean = ds.getHikariPoolMXBean();
if (poolBean != null) {
poolBean.softEvictConnections();
}
}
}
@Override
public void close() throws Exception {
for (HikariDataSource ds : dataSources) {
ds.close();
}
}
How to set urls
The HikariMultiPoolManager
takes a list of urls
and an optional input argument maximumPoolSize
(defaults to 1). Each path in the list will get a HikariDataSource
in the pool, that readers can use to query the database(s) they connect to. If you have a maximumPoolSize
that is larger than 1, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool.
To learn more about database instances and connections, see Connect to multiple databases.
- One database
- One database with >1 instances
- >1 Databases on one account
- >1 Databases on >1 accounts
To create a connection pool with 3 connections to the same database, you can pass a single database path, and set maximumPoolSize=3
:
List<String> urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 3);
Set access_mode=read_only
to avoid potential write conflicts. This is especially important if your maximumPoolSize
is larger than the number of databases.
You can also create multiple connections to the same database using different DuckDB instances. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding &user=<n>
to the database path:
List<String> urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=1");
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=2");
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=3");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);
Set access_mode=read_only
to avoid potential write conflicts. This is especially important if your pool_size
is larger than the number of databases.
You can also create multiple connections to separate databases in the same MotherDuck account using different DuckDB instances. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path:
List<String> urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token + "&access_mode=read_only");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);
Set access_mode=read_only
to avoid potential write conflicts. This is especially important if your pool_size
is larger than the number of databases.
You can also create multiple connections to separate databases in separate MotherDuck accounts using different DuckDB instances. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path:
List<String> urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token1 + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token2 + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token3 + "&access_mode=read_only");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);
Set access_mode=read_only
to avoid potential write conflicts. This is especially important if your pool_size
is larger than the number of databases.
How to run queries with a thread pool
You can then fetch connections from the pool, for example, to run queries from a queue. You can use a ThreadPoolExecutor
with 3 workers to fetch connections from the pool and run the queries using a run_query
function:
private static String queryString(HikariMultiPoolManager pool, String query) throws SQLException {
try (Connection conn = pool.getConnection();
Statement ps = conn.createStatement();
ResultSet rs = ps.executeQuery(query)) {
logger.info("connection = {}", conn);
String res = rs.next() ? rs.getString(1) : "[not found]";
logger.info("Got: {}", res);
return res;
}
}
public static void main(String[] args) throws Exception {
if (token == null) {
throw new IllegalArgumentException(
"Please provide `motherduck_token` environment variable");
}
List<String> queries = new ArrayList<>();
// Add queries here
// Example:
queries.add("SELECT 42;");
queries.add("SELECT 'Hello World!';");
List<String> urls = new ArrayList<>();
// Add urls here
// Example:
urls.add("jdbc:duckdb:md:my_db?user=1&motherduck_token=" + token);
urls.add("jdbc:duckdb:md:my_db?user=2&motherduck_token=" + token);
urls.add("jdbc:duckdb:md:my_db?user=3&motherduck_token=" + token);
// Create thread pool and run queries
try(HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);) {
ExecutorService executor = Executors.newFixedThreadPool(urls.size());
for (String query : queries) {
executor.submit(() -> queryString(pool, query));
}
executor.shutdown();
boolean success = executor.awaitTermination(30, TimeUnit.SECONDS);
if (success) {
logger.info("successfully ran {} queries in threads with connection pool", queries.size());
}
}
}
}
Reset the connection pool at least once every 24 hours, by soft evicting all connections. This ensures that you are always running on the latest version of MotherDuck.
pool.evict()