<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>The Dataset Index</title>
    <link>https://dataset.kymatalabs.com</link>
    <description>The living index of data-centric AI tooling — labeling, synthetic data, curation, augmentation, dataset frameworks.</description>
    <item><title>HumanSignal/label-studio — momentum 87</title><link>https://dataset.kymatalabs.com/p/humansignal-label-studio/</link><guid isPermaLink="false">HumanSignal/label-studio</guid><description>Label Studio is a multi-type data labeling and annotation tool with standardized output format</description></item>
    <item><title>dolthub/dolt — momentum 86</title><link>https://dataset.kymatalabs.com/p/dolthub-dolt/</link><guid isPermaLink="false">dolthub/dolt</guid><description>Dolt – Git for Data</description></item>
    <item><title>huggingface/datasets — momentum 86</title><link>https://dataset.kymatalabs.com/p/huggingface-datasets/</link><guid isPermaLink="false">huggingface/datasets</guid><description>🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools</description></item>
    <item><title>joke2k/faker — momentum 85</title><link>https://dataset.kymatalabs.com/p/joke2k-faker/</link><guid isPermaLink="false">joke2k/faker</guid><description>Faker is a Python package that generates fake data for you.</description></item>
    <item><title>stefan-jansen/machine-learning-for-trading — momentum 85</title><link>https://dataset.kymatalabs.com/p/stefan-jansen-machine-learning-for-trading/</link><guid isPermaLink="false">stefan-jansen/machine-learning-for-trading</guid><description>Code for Machine Learning for Algorithmic Trading, 2nd edition.</description></item>
    <item><title>cvat-ai/cvat — momentum 84</title><link>https://dataset.kymatalabs.com/p/cvat-ai-cvat/</link><guid isPermaLink="false">cvat-ai/cvat</guid><description>Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling services, for image, video, and 3D annotation with AI-assisted labeling, quality assurance, team collaborat</description></item>
    <item><title>treeverse/dvc — momentum 83</title><link>https://dataset.kymatalabs.com/p/treeverse-dvc/</link><guid isPermaLink="false">treeverse/dvc</guid><description>🦉 Data Versioning and ML Experiments</description></item>
    <item><title>simonw/datasette — momentum 82</title><link>https://dataset.kymatalabs.com/p/simonw-datasette/</link><guid isPermaLink="false">simonw/datasette</guid><description>An open source multi-tool for exploring and publishing data</description></item>
    <item><title>voxel51/fiftyone — momentum 82</title><link>https://dataset.kymatalabs.com/p/voxel51-fiftyone/</link><guid isPermaLink="false">voxel51/fiftyone</guid><description>Refine high-quality datasets and visual AI models</description></item>
    <item><title>datajuicer/data-juicer — momentum 79</title><link>https://dataset.kymatalabs.com/p/datajuicer-data-juicer/</link><guid isPermaLink="false">datajuicer/data-juicer</guid><description>Data processing for and with foundation models!  🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷</description></item>
    <item><title>snorkel-team/snorkel — momentum 78</title><link>https://dataset.kymatalabs.com/p/snorkel-team-snorkel/</link><guid isPermaLink="false">snorkel-team/snorkel</guid><description>A system for quickly generating training data with weak supervision</description></item>
    <item><title>NVIDIA/DALI — momentum 78</title><link>https://dataset.kymatalabs.com/p/nvidia-dali/</link><guid isPermaLink="false">NVIDIA/DALI</guid><description>A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.</description></item>
    <item><title>treeverse/lakeFS — momentum 78</title><link>https://dataset.kymatalabs.com/p/treeverse-lakefs/</link><guid isPermaLink="false">treeverse/lakeFS</guid><description>lakeFS - Data version control for your data lake | Git for data</description></item>
    <item><title>argilla-io/argilla — momentum 77</title><link>https://dataset.kymatalabs.com/p/argilla-io-argilla/</link><guid isPermaLink="false">argilla-io/argilla</guid><description>Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets</description></item>
    <item><title>torchgeo/torchgeo — momentum 77</title><link>https://dataset.kymatalabs.com/p/torchgeo-torchgeo/</link><guid isPermaLink="false">torchgeo/torchgeo</guid><description>TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data</description></item>
    <item><title>ConardLi/easy-dataset — momentum 76</title><link>https://dataset.kymatalabs.com/p/conardli-easy-dataset/</link><guid isPermaLink="false">ConardLi/easy-dataset</guid><description>A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval</description></item>
    <item><title>tensorflow/datasets — momentum 76</title><link>https://dataset.kymatalabs.com/p/tensorflow-datasets/</link><guid isPermaLink="false">tensorflow/datasets</guid><description>TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...</description></item>
    <item><title>OpenCSGs/csghub — momentum 76</title><link>https://dataset.kymatalabs.com/p/opencsgs-csghub/</link><guid isPermaLink="false">OpenCSGs/csghub</guid><description>CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugg</description></item>
    <item><title>sdv-dev/SDV — momentum 76</title><link>https://dataset.kymatalabs.com/p/sdv-dev-sdv/</link><guid isPermaLink="false">sdv-dev/SDV</guid><description>Synthetic data generation for tabular data</description></item>
    <item><title>argilla-io/distilabel — momentum 75</title><link>https://dataset.kymatalabs.com/p/argilla-io-distilabel/</link><guid isPermaLink="false">argilla-io/distilabel</guid><description>Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.</description></item>
    <item><title>logpai/loghub — momentum 75</title><link>https://dataset.kymatalabs.com/p/logpai-loghub/</link><guid isPermaLink="false">logpai/loghub</guid><description>A large collection of system log datasets for AI-driven log analytics [ISSRE'23]</description></item>
    <item><title>unsplash/datasets — momentum 74</title><link>https://dataset.kymatalabs.com/p/unsplash-datasets/</link><guid isPermaLink="false">unsplash/datasets</guid><description>🎁  7,400,000+ Unsplash images made available for research and machine learning</description></item>
    <item><title>TorchIO-project/torchio — momentum 74</title><link>https://dataset.kymatalabs.com/p/torchio-project-torchio/</link><guid isPermaLink="false">TorchIO-project/torchio</guid><description>Medical imaging processing for AI applications.</description></item>
    <item><title>cuevhv/mamma — momentum 74</title><link>https://dataset.kymatalabs.com/p/cuevhv-mamma/</link><guid isPermaLink="false">cuevhv/mamma</guid><description>Official code for MAMMA: Markerless Accurate Multi-person Motion Acquisition.</description></item>
    <item><title>synthetichealth/synthea — momentum 73</title><link>https://dataset.kymatalabs.com/p/synthetichealth-synthea/</link><guid isPermaLink="false">synthetichealth/synthea</guid><description>Synthetic Patient Population Simulator</description></item>
    <item><title>imaNNeo/fl_chart — momentum 72</title><link>https://dataset.kymatalabs.com/p/imanneo-fl-chart/</link><guid isPermaLink="false">imaNNeo/fl_chart</guid><description>FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, Radar Chart and Candlestick Chart.</description></item>
    <item><title>diffgram/diffgram — momentum 72</title><link>https://dataset.kymatalabs.com/p/diffgram-diffgram/</link><guid isPermaLink="false">diffgram/diffgram</guid><description>The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.</description></item>
    <item><title>bespokelabsai/curator — momentum 72</title><link>https://dataset.kymatalabs.com/p/bespokelabsai-curator/</link><guid isPermaLink="false">bespokelabsai/curator</guid><description>Synthetic data curation for post-training and structured data extraction</description></item>
    <item><title>intellicia-public/parastore — momentum 72</title><link>https://dataset.kymatalabs.com/p/intellicia-public-parastore/</link><guid isPermaLink="false">intellicia-public/parastore</guid><description>Draw a store, generate LLM personas, and watch them shop — an isometric 3D sandbox for synthetic-consumer experiments.</description></item>
    <item><title>doccano/doccano — momentum 71</title><link>https://dataset.kymatalabs.com/p/doccano-doccano/</link><guid isPermaLink="false">doccano/doccano</guid><description>Open source annotation tool for machine learning practitioners.</description></item>
  </channel>
</rss>
