Skip to main content

PyPi Data

Explore the data

Interactive dashboard built on the DuckDB PyPI download stats. Use it as a starting point for your own Dives.

Loading live Dive preview...

About the dataset

PyPi is the Python Package Index, a repository of software packages for the Python programming language. It is a central repository that allows users to find and install software developed and shared by the Python community.

The dataset includes information about packages, releases, and downloads on the duckdb python package.

It's refreshed weekly and you can visit the DuckDB Stats dashboard.

How to query the dataset

A dedicated shared database is maintained to query the dataset.

aws-us-east-1 region only

This database is only available for accounts in the aws-us-east-1 region.

To attach it to your workspace, you can use the following command:

ATTACH 'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410' AS duckdb_stats;

Example queries

The following queries assume that the current database connected is duckdb_stats. Run use duckdb_stats to switch to it.

Get weekly download stats

SELECT
DATE_TRUNC('week', download_date) AS week_start_date,
version,
country_code,
python_version,
SUM(daily_download_sum) AS weekly_download_sum
FROM
duckdb_stats.main.pypi_daily_stats
GROUP BY
ALL
ORDER BY
week_start_date

Schema

pypi_file_downloads

This table contains the raw data. Each row represents a download from PyPi.

column_namecolumn_typenull
timestampTIMESTAMPYES
country_codeVARCHARYES
urlVARCHARYES
projectVARCHARYES
fileSTRUCT(filename VARCHAR, project VARCHAR, "version" VARCHAR, "type" VARCHAR)YES
detailsSTRUCT("installer" STRUCT("name" VARCHAR, "version" VARCHAR), "python" VARCHAR, "implementation" STRUCT("name" VARCHAR, "version" VARCHAR), "distro" STRUCT("name" VARCHAR, "version" VARCHAR, "id" VARCHAR, "libc" STRUCT("lib" VARCHAR, "version" VARCHAR)), "system" STRUCT("name" VARCHAR, "release" VARCHAR), "cpu" VARCHAR, "openssl_version" VARCHAR, "setuptools_version" VARCHAR, "rustc_version" VARCHAR, "ci" BOOLEAN)YES
tls_protocolVARCHARYES
tls_cipherVARCHARYES

pypi_daily_stats

This table is a daily aggregation of the raw data. It contains the following columns:

column_namecolumn_typenull
load_idVARCHARYES
download_dateDATEYES
system_nameVARCHARYES
system_releaseVARCHARYES
versionVARCHARYES
projectVARCHARYES
country_codeVARCHARYES
cpuVARCHARYES
python_versionVARCHARYES
daily_download_sumBIGINTYES