Kymata Labs/The Living IndexesBuilt by tekvisions ↗
The Dataset Index / Curation & Quality / #217
GAIR-NLP

GAIR-NLP/ProX

by GAIR-NLP · Curation & Quality · updated 11mo ago

[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

30
momentum
269
stars
18
forks
#217
rank
continualcontinual-pre-trainingdata-centric-aidata-qualityllamallmmistralneural-symbolicpre-training
View on GitHub →