谷歌发布的数据集

Datasets & Tools Description
AIST++ 3D keypoints with corresponding images for dance motions covering 10 dance genres
AutoFlow 40k image pairs with ground truth optical flow
C4_200M A 200 million sentence synthetic dataset for grammatical error correction
CIFAR-5M Dataset of ~6 million synthetic CIFAR-10–like images (RGB 32 x 32 pix)
Crisscrossed Captions Set of semantic similarity ratings for the MS-COCO dataset
Disfl-QA Dataset of contextual disfluencies for information seeking
Distilled Datasets Distilled datasets from CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, and SVHN
EvolvingRL 1000 top performing RL algorithms discovered through algorithm evolution
GoEmotions A human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories
H01 Dataset 1.4 petabyte browsable reconstruction of the human cortex
Know Your Data Tool for understanding biases in a dataset
Lens Flare 5000 high-quality RGB images of typical lens flare
More Inclusive Annotations for People (MIAP) Improved bounding box annotations for a subset of the person class in the Open Images dataset
Mostly Basic Python Problems 1000 Python programming problems, incl. task description, code solution & test cases
NIH ChestX-ray14 dataset labels Expert labels for a subset of the NIH ChestX-ray14 dataset
Open Buildings Locations and footprints of 516 million buildings with coverage across most of Africa
Optical Polarization from Curie 5GB of optical polarization data from the Curie submarine cable
Readability Scroll Scroll interactions of ~600 participants reading texts from the OneStopEnglish corpus
RLDS Tools to store, retrieve & manipulate episodic data for reinforcement learning
Room-Across-Room (RxR) Multilingual dataset for vision-and-language navigation in English, Hindi and Telugu
Soft Attributes ~6k sets of movie titles annotated with single English soft attributes
TimeDial Dataset of multiple choice span-filling tasks for temporal commonsense reasoning in dialog
ToTTo English table-to-text generation dataset with a controlled text generation task
Translated Wikipedia Biographies Dataset for analysis of common gender errors in?NMT?for English, Spanish and German
UI Understanding Data for UIBert Datasets for two UI understanding tasks, AppSim & RefExp
WikiFact Wikipedia & WikiData–based dataset to train relationship classifiers and fact extraction models
WIT Wikipedia-based Image Text dataset for multimodal multilingual ML

 

 

参考:https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html

发表评论