pg_mooncake: Columnstore Tables with DuckDB Execution in Postgres

2024/10/30 - 4 min read

BY

Editor's note: this post is by a guest author, Pranav Aurora, who is a co-founder of mooncake

Another system promising analytics in Postgres?

pg_duckdb is officially in beta and has been positively received by the community. So, what’s the deal with pg_mooncake?

Well, DuckDB-powered analytics in Postgres must look and feel like Postgres.

pg_mooncake builds on this by introducing a native columnstore table to Postgres–supporting inserts, updates, joins, and soon, indexes. These tables are written as Iceberg or Delta tables (parquet files + metadata) in your object store. It leverages pg_duckdb for its vectorized execution engine and deploys it as a part of the extension. You can try it using their docker image, and running:

Copy code

docker run --name mooncake-demo -e POSTGRES_HOST_AUTH_METHOD=trust -d ccqmpux/demo-arm64 docker run -it --rm --link mooncake-demo:postgres ccqmpux/demo-arm64 psql -h postgres -U postgres

The combination of a columnar storage format and DuckDB execution, means pg_mooncake can deliver up to 1,000x faster analytics over regular Postgres tables. This performance is akin to running DuckDB on parquet files.

Here's now to create your columnstore table:

Copy code

CREATE extension pg_mooncake; CREATE TABLE sales ( id SERIAL PRIMARY KEY, product_name TEXT, quantity INT, price DECIMAL(10, 2) ) USING columnstore;

pg_mooncake is open source under a permissive MIT license. Mooncake Labs, the main contributors, are committed to keeping it that way and are grateful for the support from DuckDB Labs and the community. The extension is now available in preview on Neon.

So, when should I use pg_mooncake?

pg_mooncake gives developers a native table experience for working with columnar format data in Postgres. This enables two key scenarios:

1. Analytics on live datasets

You can run transactional and batch inserts, updates, and deletes directly on these columnstore tables.

Copy code

BEGIN; INSERT INTO sales (id, product_name, quantity, price) VALUES (3, 'Tablet', 30, 600); UPDATE sales SET quantity = 25 WHERE id = 3; COMMIT;

Run your OLAP queries on these tables with up-to-date data:

Copy code

SELECT SUM(quantity * price) AS total_sales_amount FROM sales;

2. Writing Postgres data to Delta Lake and Iceberg Tables

Instead of backing your Postgres tables as ad-hoc Parquet files, you can write directly to open tables (Iceberg and Delta) in your object store. You can also query these tables using DuckDB outside of Postgres.

Soon, we’ll also provide a path to sync your existing Iceberg and Delta tables with your Postgres columnstore tables.

How DuckDB enabled us to ship the extension in 60 days.

In just 60 days of hands-on keyboard, we were able to ship a clean and efficient system, thanks to the DuckDB execution engine. We didn’t have to modify the engine, and its performance is outstanding right out of the box. We're extremely grateful for the DuckDB community and foundation for making this possible.

DuckDB is the default execution engine for our columnstore tables, shipped as part of the pg_mooncake extension. pg_mooncake is modular and will seamlessly integrate future updates to pg_duckdb, and performance is comparable to running pg_duckdb on Parquet.

We initially set out to implement pg_mooncake fully within Postgres but ultimately developed it as a DuckDB storage extension.This is how we shipped insert, update and deletes on columnstore tables.

It was the duck that enabled the Mooncake.

What's the MotherDuck connection?

Since Mooncake writes its columnstore tables in Delta or Iceberg format, you can query these tables outside of Postgres using MotherDuck or DuckDB without needing to stitch together files or wrangle with DataFrames –– just pass it the directory on your filesystem.

Copy code

SELECT * FROM delta_scan('s3://some/delta/table');

For larger datasets, when running queries in your product Postgres database isn’t ideal, MotherDuck offloads the workload to the cloud—allowing users to run the same queries with improved performance, without any modifications.

pg_mooncake will support this pattern more natively in the future – taking advantage of MotherDuck's “dual execution” capabilities to offload queries to the cloud.

Mooncake and MotherDuck form a powerful pair for online, user-facing analytics and offline warehousing-type analytics, all leveraging a single copy of data exposed as a table in Postgres. No CDC, ETL, or pipelines, just ship your roadmap.

pg_mooncake is officially in preview.

pg_duckdb was officially announced in beta last week, and pg_mooncake is now in preview today. The duck and the elephant are joining together nicely. Over a Mooncake to share.

Check the extension repository for more information, and start playing with it on your Neon and MotherDuck accounts

CONTENT
  1. Another system promising analytics in Postgres?
  2. So, when should I use pg_mooncake?
  3. How DuckDB enabled us to ship the extension in 60 days.
  4. What's the MotherDuck connection?
  5. pg_mooncake is officially in preview.

Start using MotherDuck now!

blog subscription icon

Subscribe to motherduck blog