Datasets & Tools | Description |
AIST++ | 3D keypoints with corresponding images for dance motions covering 10 dance genres |
AutoFlow | 40k image pairs with ground truth optical flow |
C4_200M | A 200 million sentence synthetic dataset for grammatical error correction |
CIFAR-5M | Dataset of ~6 million synthetic CIFAR-10–like images (RGB 32 x 32 pix) |
Crisscrossed Captions | Set of semantic similarity ratings for the MS-COCO dataset |
Disfl-QA | Dataset of contextual disfluencies for information seeking |
Distilled Datasets | Distilled datasets from CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, and SVHN |
EvolvingRL | 1000 top performing RL algorithms discovered through algorithm evolution |
GoEmotions | A human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories |
H01 Dataset | 1.4 petabyte browsable reconstruction of the human cortex |
Know Your Data | Tool for understanding biases in a dataset |
Lens Flare | 5000 high-quality RGB images of typical lens flare |
More Inclusive Annotations for People (MIAP) | Improved bounding box annotations for a subset of the person class in the Open Images dataset |
Mostly Basic Python Problems | 1000 Python programming problems, incl. task description, code solution & test cases |
NIH ChestX-ray14 dataset labels | Expert labels for a subset of the NIH ChestX-ray14 dataset |
Open Buildings | Locations and footprints of 516 million buildings with coverage across most of Africa |
Optical Polarization from Curie | 5GB of optical polarization data from the Curie submarine cable |
Readability Scroll | Scroll interactions of ~600 participants reading texts from the OneStopEnglish corpus |
RLDS | Tools to store, retrieve & manipulate episodic data for reinforcement learning |
Room-Across-Room (RxR) | Multilingual dataset for vision-and-language navigation in English, Hindi and Telugu |
Soft Attributes | ~6k sets of movie titles annotated with single English soft attributes |
TimeDial | Dataset of multiple choice span-filling tasks for temporal commonsense reasoning in dialog |
ToTTo | English table-to-text generation dataset with a controlled text generation task |
Translated Wikipedia Biographies | Dataset for analysis of common gender errors in?NMT?for English, Spanish and German |
UI Understanding Data for UIBert | Datasets for two UI understanding tasks, AppSim & RefExp |
WikiFact | Wikipedia & WikiData–based dataset to train relationship classifiers and fact extraction models |
WIT | Wikipedia-based Image Text dataset for multimodal multilingual ML |
参考:https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html