The Dataset Index / Curation & Quality / #217
GAIR-NLP/ProX
by GAIR-NLP · Curation & Quality · updated 11mo ago
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
30
momentum
269
stars
18
forks
#217
rank
continualcontinual-pre-trainingdata-centric-aidata-qualityllamallmmistralneural-symbolicpre-training
View on GitHub →