The Unsplash Dataset is composed of multiple TSV files:
The photos.tsv
dataset has one row per photo. It contains properties of the photo, the name of the contributor, the image URL, and overall stats.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
photo_url | Permalink URL to the photo page on unsplash.com |
photo_image_url | URL of the image file. Note: this is a dynamic URL, so you can apply resizing and customization operations directly on the image |
photo_submitted_at | Timestamp of when the photo was submitted to Unsplash |
photo_featured | Whether the photo was promoted to the Editorial feed or not |
photo_width | Width of the photo in pixels |
photo_height | Height of the photo in pixels |
photo_aspect_ratio | Aspect ratio of the photo |
photo_description | Description of the photo written by the photographer |
photographer_username | Username of the photographer on Unsplash |
photographer_first_name | First name of the photographer |
photographer_last_name | Last name of the photographer |
exif_camera_make | Camera make (brand) extracted from the EXIF data |
exif_camera_model | Camera model extracted from the EXIF data |
exif_iso | ISO setting of the camera, extracted from the EXIF data |
exif_aperture_value | Aperture setting of the camera, extracted from the EXIF data |
exif_focal_length | Focal length setting of the camera, extracted from the EXIF data |
exif_exposure_time | Exposure time setting of the camera, extracted from the EXIF data |
photo_location_name | Location of the photo |
photo_location_latitude | Latitude of the photo |
photo_location_longitude | Longitude of the photo |
photo_location_country | Country where the photo was made |
photo_location_city | City where the photo was made |
stats_views | Total # of times that a photo has been viewed on the Unsplash platform |
stats_downloads | Total # of times that a photo has been downloaded via the Unsplash platform |
ai_description | Textual description of the photo, generated by a 3rd party AI |
ai_primary_landmark_name | Landmark present in the photo, generated by a 3rd party AI |
ai_primary_landmark_latitude | Latitude of the landmark, generated by a 3rd party AI |
ai_primary_landmark_longitude | Longitude of the landmark, generated by a 3rd party AI |
ai_primary_landmark_confidence | Landmark confidence of the 3rd party AI |
blur_hash | BlurHash hash of the photo |
The keywords.tsv
dataset has one row per photo-keyword pair. It contains data
about how a keyword is connected to a photo and the conversions of the photo our search engine for a particular keyword.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
keyword | Keyword or search term |
ai_service_1_confidence | Confidence for the keyword from a 3rd party AI (0-100) |
ai_service_2_confidence | Confidence for the keyword from another 3rd party AI (0-100) |
suggested_by_user | Whether the keyword was added by a user (human) |
Note: A collection on Unsplash is a user created grouping of photos. These are similar to boards on Pinterest and can often group photos in complex and creative ways.
The collections.tsv
dataset has one row per photo-collection pair. Whenever a photo
belongs to a collection created by a user, it will appear as one row. Each row describes
when the photo was added to the collection and gives the title of the collection.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
collection_id | ID of the Unsplash collection containing the photo |
collection_title | Title of the collection containing the photo |
photo_collected_at | Timestamp of when the photo was added to the collection |
Note: a conversion is currently defined as a user selecting an image to download it.
The conversions.tsv
dataset has one row per search conversion. The dataset tells you which photo has been downloaded for a search, the country of origin, and an anonymous identifier to indiciate the unique users. The data goes back up to 1 year before the release of each version of the dataset.
Field | Description |
---|---|
converted_at | Timestamp of the conversion event |
conversion_type | Type of conversion (download only for now) |
keyword | Keyword that was searched and led to the conversion |
photo_id | Photo ID of the photo that converted |
anonymous_user_id | Anonymous user ID |
conversion_country | Country code of the device geolocation |
Note: The coverage and score data comes from a 3rd party AI
The colors.tsv
dataset has one row per major color present in the photo. The dataset tells which colors are contained within a photo, their coverage as a percentage, and a score for how in focus the color is.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
hex | Hexadecimal representation of the color |
red | Red component of the photo in the RGB system |
green | Green component of the photo in the RGB system |
blue | Blue component of the photo in the RGB system |
keyword | Name of the closest color as a CSS color keyword |
coverage | Pixel coverage of the color as a percentage |
score | Score of the color in the photo (including the notion of focus) |
You can merge the different datasets through the primary key ID fields (usually the photo_id
field). With this you'll be able to cross-reference properties from the photos dataset with data from the keywords or conversions dataset.
For help loading the dataset, see the how to docs.