---
sidebar_position: 5
title: PyPi Data
description: Want to know how users find and install software you've developed for the Python Community? This DuckDB and MotherDuck database allows you to use SQL to perform data analysis on PyPi data.
---

import EmbeddedDive from '@site/src/components/EmbeddedDive';

## Explore the data

Interactive dashboard built on the DuckDB PyPI download stats. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/).

<EmbeddedDive
  diveId="c75e16cc-64ed-4960-a2ba-470f47ccf605"
  title="DuckDB PyPI downloads"
  height="700px"
/>

## About the dataset

PyPi is the Python Package Index, a repository of software packages for the Python programming language. It is a central repository that allows users to find and install software developed and shared by the Python community.

The dataset includes information about packages, releases, and downloads on the `duckdb` python package.

It's refreshed **weekly** and you can visit the [DuckDB Stats dashboard](https://duckdbstats.com).

## How to query the dataset

A dedicated shared database is maintained to query the dataset.

:::note `aws-us-east-1` region only
This database is only available for accounts in the `aws-us-east-1` region.
:::

To attach it to your workspace, you can use the following command:

```sql
ATTACH 'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410' AS duckdb_stats;
```

## Schema
### pypi_file_downloads

This table contains the raw data. Each row represents a download from PyPi.

| column_name  | column_type                                                                                                    | null |
|--------------|----------------------------------------------------------------------------------------------------------------|------|
| timestamp    | TIMESTAMP                                                                                                      | YES  |
| country_code | VARCHAR                                                                                                        | YES  |
| url          | VARCHAR                                                                                                        | YES  |
| project      | VARCHAR                                                                                                        | YES  |
| file         | STRUCT(filename VARCHAR, project VARCHAR, "version" VARCHAR, "type" VARCHAR)                                    | YES  |
| details      | STRUCT("installer" STRUCT("name" VARCHAR, "version" VARCHAR), "python" VARCHAR, "implementation" STRUCT("name" VARCHAR, "version" VARCHAR), "distro" STRUCT("name" VARCHAR, "version" VARCHAR, "id" VARCHAR, "libc" STRUCT("lib" VARCHAR, "version" VARCHAR)), "system" STRUCT("name" VARCHAR, "release" VARCHAR), "cpu" VARCHAR, "openssl_version" VARCHAR, "setuptools_version" VARCHAR, "rustc_version" VARCHAR, "ci" BOOLEAN) | YES  |
| tls_protocol | VARCHAR                                                                                                        | YES  |
| tls_cipher   | VARCHAR                                                                                                        | YES  |


### pypi_daily_stats

This table is a daily aggregation of the raw data. It contains the following columns:

| column_name       | column_type | null |
|-------------------|-------------|------|
| load_id           | VARCHAR     | YES  |
| download_date     | DATE        | YES  |
| system_name       | VARCHAR     | YES  |
| system_release    | VARCHAR     | YES  |
| version           | VARCHAR     | YES  |
| project           | VARCHAR     | YES  |
| country_code      | VARCHAR     | YES  |
| cpu               | VARCHAR     | YES  |
| python_version    | VARCHAR     | YES  |
| daily_download_sum| BIGINT      | YES  |

## Examples queries
The following queries assume that the current database connected is `duckdb_stats`. Run `use duckdb_stats` to switch to it.

### Get weekly download stats

```sql
SELECT 
    DATE_TRUNC('week', download_date) AS week_start_date,
    version,
    country_code,
    python_version,
    SUM(daily_download_sum) AS weekly_download_sum 
FROM 
   duckdb_stats.main.pypi_daily_stats 
GROUP BY 
    ALL
ORDER BY 
    week_start_date
```


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `POST https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

Payload:

```json
{
  "page_path": "/getting-started/sample-data-queries/pypi/",
  "page_title": "PyPi Data",
  "text": "<the user's feedback, max 2000 characters>",
  "source": "<optional identifier for your interface, for example 'claude.ai' or 'chatgpt'>"
}
```

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed payloads, and `429` when rate-limited.
