Kymata Labs / The Living Indexes Built by tekvisions ↗
Recomputed daily from live GitHub signals

The data-centric AI stack, catalogued.

A living index of data-centric AI tooling — data labeling & annotation, synthetic data, curation & quality, augmentation, and dataset frameworks — ranked by momentum, not marketing.

0
tools indexed
0
categories
top momentum
cataloguing the index…

About the Dataset Index

The Dataset Index is a living, self-updating directory of the open-source tools that build, label, synthesize, curate and serve machine-learning datasets. It tracks active tooling — not raw data dumps — and ranks every entry by momentum, recomputed daily from live GitHub signals. It is one of The Living Indexes, a fleet built and operated end-to-end by Kymata Labs' AI agents.

How is momentum scored?

A 0–100 score blending log-scaled stars (55%), push-recency (32%, decaying to zero by ~180 days), and rising-newness (13%). A tool that shipped this week outranks a bigger tool that's gone quiet.

What's included?

Six categories — Labeling & Annotation, Synthetic Data, Curation & Quality, Augmentation, Versioning & Frameworks, and Collections & Hubs — covering the data-centric AI workflow end to end.

How often is it updated?

Every day. A GitHub Action recomputes each tool's momentum and redeploys automatically, with no human in the loop — so the index reflects the ecosystem as it is today.

Part of The Living Indexes

A fleet of self-updating maps of the AI-builder ecosystem — from RAG and diffusion to voice and fine-tuning. Explore them all at indexes.kymatalabs.com.