Large Model Dataset

We can provide you with data such as audio, video, text, images, and 3D models for training large language models.

YouTube Video Dataset

Video files from well-known datasets or videos with specified IDs.

Well-known datasets

Such as YouTube-8M、KOALA36M、Sports-1M

Your namelist

Download base on your channel list or videos list

Filter before downloading

Filter based on criteria such as video quality, release year, VBR, and duration before downloading

Great value for money

$25/TB (based on data volume, storage region, project timeline, and payment method)

YouTube Video Dataset
YouTube Video Dataset
YouTube Video Dataset

Twitch Video Dataset

Integrated Better Auth with multiple sign-in options.

Your namelist

Download base on your channel list or videos list

Filter before downloading.

Filter based on criteria such as video quality, release year, VBR, and duration before downloading.

Great value for money

$18/TB (based on data volume, storage region, project timeline, and payment method)

Twitch Video Dataset

Podcast Audio Dataset

High-quality podcast audio dataset, which guarantees a minimum sampling rate of 44kHz and a duration of no less than 60 seconds per audio file

High Quality

It guarantees a minimum sampling rate of 44kHz, along with high-quality podcast audio at sampling rates such as 96kHz, 144kHz, and 192kHz.

Full Meta

The information include title, description, tags, duration, quality, years

Great value for money

$30/TB (based on data volume, storage region, project timeline, and payment method)

Podcast Audio Dataset
Podcast Audio Dataset

Questions bank [HQ]

It includes questions, answers, and analysis processes, covering chemical diagrams, physical diagrams, mathematical diagrams, tables, and more

Clear Subjects

Including math, physics, engineering, chemistry, social sciences, and other disciplines

Complete Graphical Data

It should be noted that graphics are not images—during the preprocessing phase before training, graphical data can be converted into a data format that computers can better understand

Great value for money

$0.0007/row (based on data volume, storage region, project timeline, and payment method)

Questions bank [HQ]
PUSH
Support main protocols

Support main protocols

Amazone S3, Cloudflare R2, Tigris, Alibaba OSS, Bytedance TOS, Tencent Cloud COS, or other object storage services

  • Supports cloud service providers in the China region

    Such as Alibaba OSS, Bytedance TOS, Tencent Cloud COS, HUAWEI Cloud OBS, Qiniu CLoud or other

  • Compatible with any S3-protocol object storage service

    Push to any object storage services that supports the S3 protocol

  • Multi-region data storage capabilities

    Such as Tokyo, Salt Lake City, Amsterdam, Berlin, Shanghai, Singapore and other regions

  • Large bandwidth

    5Gbps~50Gbps, as you wish

FEEDBACK
He who pays the piper calls the tune

What Our Customers Say

Don't just take our word for it

  • 5.0

    The scale of our project is quite substantial, and we were also given a very tight timeline. Their team transfers approximately 130TB of data daily to the data center in the Netherlands—their efficiency is truly impressive. They completed all tasks in under 15 days, which saved us a significant amount of time.

    Nicholas W

    Nicholas W, Ai PD(Amsterdam)

  • 4.8

    可以自定义画质和时长筛选真的很贴合团队要求,事前约定好只要 1080P 及以上画质,并且丢弃时长不足 180 秒的视频,这种事前过滤的服务帮节约了很多时间成本。我们直接开的阿里云新加坡,谈了整包,价格非常划算。

    Cherry

    Cherry, 大模型负责人(Singapore)

  • 5.0

    我们这边都是体育类视频,也包括了直播回放。合作很畅快,期间我们提供了十几批 channel 清单进行扫描和筛选,不同类型的视频要求按不同画质和时长过滤。虽然我们选的对象存储厂商不太好,存储带宽只给了 2Gbps,导致速度很慢,经过十几天终于是完成了,很感谢耐心支持!!!

    张瑾

    张瑾, 大模型采购(北京)

Frequently Asked Questions

Some questions about our services and data, as well as guidance to help you build the most cost-effective solution