[go: nahoru, domu]

Skip to content
@opendatalab

OpenDataLab

OpenDataLab provides access to numerous significant open-source datasets.
 
OpenDataLab website HOT      OpenXLab platform TRY IT OUT
 

English🌎|简体中文🀄

OpenDataLab Provide ecology for high-quality datasets for community. It provides:

Extensive open data resources

● High-speed and simple way to access open datasets
● Large scale open datasets resources
● 1200+ Open datasets for Computer Vision\Large Model
● 200+ Open datasets by CVPR
● Categorized datasets for hot topics

Open-source data processing toolkits

● Data acquisition toolkits supporting large datasets
● Data acquisition toolkits supporting kinds of tasks
● Open source intelligent Toolbox for Labeling

Dataset description language

● Format standardization
● DSDL: Dataset Description Language
● Define a CV dataset by DSDL
● OpenDataLab Standardized 100+ CV Datasets

Check our tutorials videos (in Chinese) to get started.


In September this year, we have upgraded and launched the function of authors uploading datasets independently. We hereby invite you to participate in using it to better promote your open source datasets, AI research results, etc., so that more people can access, obtain and use your dataset.

This is an introduction to the dataset autonomous upload function 【help doc】,You can create and share your dataset according to our guidelines.

If you have any questions or obstacles, please feel free to contact us OpenDataLab@pjlab.org.cn.

Popular repositories Loading

  1. WanJuan1.0 WanJuan1.0 Public

    万卷1.0多模态语料

    433 23

  2. labelU labelU Public

    Data annotation toolbox supports image, audio and video data.

    Python 219 29

  3. LabelLLM LabelLLM Public

    TypeScript 114 9

  4. VIGC VIGC Public

    AAAI 2024: Visual Instruction Generation and Correction

    Python 72 3

  5. laion5b-downloader laion5b-downloader Public

    Python 70 5

  6. opendatalab-python-sdk opendatalab-python-sdk Public

    SDK of OpenDataLab - https://opendatalab.org.cn

    Python 52 4

Repositories

Showing 10 of 27 repositories
  • Miner-PDF-Benchmark Public

    MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.

    opendatalab/Miner-PDF-Benchmark’s past year of commit activity
    Python 9 Apache-2.0 2 0 0 Updated Jul 1, 2024
  • MinerU Public

    MinerU is a one-stop, open-source data extraction tool,supports PDF/webpage/e-book extraction.

    opendatalab/MinerU’s past year of commit activity
    Python 22 AGPL-3.0 10 0 0 Updated Jul 1, 2024
  • PDF-Extract-Kit Public

    A Comprehensive Toolkit for High-Quality PDF Content Extraction

    opendatalab/PDF-Extract-Kit’s past year of commit activity
    Python 6 Apache-2.0 0 1 0 Updated Jul 1, 2024
  • LabelLLM Public
    opendatalab/LabelLLM’s past year of commit activity
    TypeScript 114 Apache-2.0 9 5 0 Updated Jun 21, 2024
  • labelU Public

    Data annotation toolbox supports image, audio and video data.

    opendatalab/labelU’s past year of commit activity
    Python 219 29 4 0 Updated Jun 11, 2024
  • labelU-Kit Public

    Data annotation component library --provided as NPM packages

    opendatalab/labelU-Kit’s past year of commit activity
    TypeScript 37 Apache-2.0 10 1 0 Updated Jun 7, 2024
  • UniMERNet Public

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    opendatalab/UniMERNet’s past year of commit activity
    Jupyter Notebook 46 Apache-2.0 3 3 0 Updated Jun 6, 2024
  • CHARM Public

    [ACL 2024 Main Conference] Chinese commonsense benchmark for LLMs

    opendatalab/CHARM’s past year of commit activity
    Python 20 Apache-2.0 2 0 0 Updated Jun 6, 2024
  • dsdl-sdk Public
    opendatalab/dsdl-sdk’s past year of commit activity
    Jupyter Notebook 13 Apache-2.0 6 0 0 Updated May 29, 2024
  • dsdl-docs Public

    Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)

    opendatalab/dsdl-docs’s past year of commit activity
    HTML 42 Apache-2.0 6 0 0 Updated May 29, 2024

Top languages

Loading…

Most used topics

Loading…