Data reigns supreme 🥇
Every day it becomes more evident that data is the limiting factor for
state-of-the-art 📈 machine learning. Your model architecture may be
revolutionary, but without high-quality data 📊 to train on, it will be doomed
to mediocrity.
Pair idea with execution and use top-notch data in your next project!
We've combed through the 2384 papers accepted to NeurIPS in 2023 and compiled
a short-list of papers introducing exciting new datasets.
Title |
Tags |
Paper |
Dataset |
Code |
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data |
perceptual similarity , image , synthetic , diffusion , JND , 2AFC |
 |
 |
 |
Visual Instruction Tuning |
vision-language , llm , instruction-tuning , image , multimodal |
 |
 |
 |
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation |
reward-model , image , text-to-image , synthetic , human-preference , alignment |
 |
 |
 |
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing |
image-editing , synthetic , image , instruction |
 |
 |
 |
REAL3D-AD |
3D , point-cloud , anomaly-detection |
 |
 |
 |
Title |
Tags |
Paper |
Dataset |
Code |
dacl10k: Benchmark for Semantic Bridge Damage Segmentation |
image , semantic segmentation , classification , construction , defect |
 |
 |
 |
Title |
Tags |
Paper |
Dataset |
Code |
Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding |
image , SAR , satellite , detection , climate |
 |
 |
 |
Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds |
3D , point cloud |
 |
 |
|
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding |
image , object , ego |
 |
 |
 |
Equivariant Similarity for Vision-Language Foundation Models |
image , similarity , caption |
 |
 |
 |
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes |
video , segmentation , tracking |
 |
 |
|
SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes |
multi-object tracking , sports |
 |
 |
 |

We've combed through the 2359 papers accepted to CVPR in 2023 and compiled
a short-list of papers introducing exciting new datasets.
Title |
Tags |
Paper |
Dataset |
Code |
MVImgNet: A Large-scale Dataset of Multi-view Images |
multi-view , image |
 |
 |
 |
GeoNet: Benchmarking Unsupervised Adaptation across Geographies |
geolocation , image |
 |
 |
|
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset |
denoising , image |
|
 |
 |
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo |
optical flow , stereo , image |
 |
 |
|
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing |
image , editing |
 |
 |
 |
ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data |
RGB-D , segmentation , video |
 |
 |
 |
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification |
low-light , cross-modal , IR |
 |
 |
 |
JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking |
pose estimation , image , keypoint , tracking |
 |
 |
|
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation |
synthetic , domain adaptation , supervised |
 |
 |
 |
Title |
Tags |
Paper |
Dataset |
Code |
Calving fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front extraction from synthetic aperture radar imagery |
glacier , climate , SAR , satellite , image , semantic segmentation |
 |
 |
 |
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting |
conservation , detection , SONAR , video , tracking , counting |
 |
 |
 |
Title |
Tags |
Paper |
Dataset |
Code |
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases |
x-ray , image , healthcare , detection |
 |
 |
|
We would love your help in making this repository even better! If we missed a
paper that introduced a new dataset, or if you can think of any ways to improve
the repository, feel free to open an issue or a pull request.
This repository is inspired by paperswithcode,
and the template was adapted from
top-cvpr-2023-papers.