- jupyter notebooks for visualizing Coast Train data and metadata
- scripts for computing inter-labeler agreement and make montage figures
- works with Doodler, and Segmentation Gym
- models trained using some Coast Train version 1 data sets are included in Segmentation Zoo
This repository contains jupyter notebooks and python scripts to create the analyses and plots in Buscombe et al. (in prep) "'Coast Train', a 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments", in review.
Note that this repository contains code only to recreate the plots in the aforementioned paper, and also to provide a programmatic way to query and search the dataset for custom applications. For details about how to access the Coast Train version 1 data themselves, please refer to the Coast Train website which contains details about where to download and how to unpack the data using companion program Doodler
Package maintainers:
git clone --depth 1 https://github.com/dbuscombe-usgs/CoastTrainMetaPlots.git
In the terminal:
conda env create --file env/coasttrain.yml
when it is installed (may take a while), you can activate it like this:
conda activate coasttrain
We also advise creating the Doodler conda environment to run the programs. See the installation instructions
The metadata files are the same as those provided in the official data release but are included here for convenience. Please refer to the Coast Train website which contains details about where to download and how to unpack the data using companion program Doodler. The csv files containing the following fields
Variable | Description |
---|---|
‘annotation_image_filename’ | npz format file containing the label data archive |
‘classes_array ‘ | names of possible classes in this dataset |
‘classes_integer‘ | one integer per element in ‘classes_array’ |
‘classes_present_integer’ | Image used by the Doodler program. This is the first 3 bands of ‘orig_image’ |
‘classes_present_array’ | one integer per element in ‘classes_present_array’ |
‘pen_width’ | final width in pixels of pen used to annotate |
‘CRF_theta’, ‘CRF_mu’ , ‘CRF_downsample_factor’, ‘Classifier_downsample_factor’, ‘prob_of_unary_potential’, ‘num_of_scales’ | internal classifier hyperparameters used by the Doodler program. |
‘num_classes’ | number of possible classes in this dataset |
‘doodle_spatial_density’ | proportion of the image annotated |
‘acc_georef’ | accuracy in meters of the specification of ‘XMin, XMax ‘ and ‘YMin , YMax’ |
‘epsg’ | EPSG code of the projected coordinate system ‘CRS’ |
‘year , month, day’ | time variables |
‘hour, minute, second‘ | time variables |
‘XMin, XMax ‘ | minimum and maximum Easting of image footprint |
‘YMin , YMax’ | minimum and maximum Northing of image footprint |
‘LonMin, LonMax’ | minimum and maximum Longitude (WGS84) of image footprint |
‘LatMin. LatMax’ | minimum and maximum Latitude (WGS84) of image footprint |
‘CRS’ | the projected coordinate system description relating to ‘XMin, XMax ‘ and ‘YMin , YMax’ |
‘px_size_m’ | horizontal size of pixel in meters |
‘ImageHeightPx’ , ‘ImageWidthPx’, ‘ImageBands’ | Number of pixels in horizontal dimensions X and Y, and the number of bands (always 3) |
Notebooks that read metadata files in the metadata
folder can be run by launching a jupyter server in your terminal
cd notebooks
jupyter notebook
Allows analysis of the class-image distributions for each data record in turn and overall. Generates the following plots:
- plots/NumLabel_all_datarecords_per_superlabel.png
- plots/Num_images_per_datarecord_containing_superclass.png
- plots/Num_images_per_datarecord_containing_class.png
Allows analysis of the geographic-image distributions for each data record in turn and overall. Generates the following plots:
- plots/Map_satellite_imagery_folium.png
- plots/All_imagery_by_lat_and_lon.png
Allows analysis of the anonymized labeler-image distributions for each data record in turn and overall. Generates the following plots:
- plots/Label_all_million_pixels_datarecords_per_ID.png
- plots/Label_per_datarecord_per_ID.png
- plots/Label_all_datarecords_per_ID.png
- plots/Million_pixels_vs_percentage_doodled.png
- plots/agreement_stats_coasttrain_naip_s2.png
This notebook simply allows you to visualize where each image is located on a map, one by one
Scripts for computing inter-labeler agreement and make montage figures are run from the command line and require modification to point the paths to the locations where you have downloaded the Coast Train npz files to on your local filesystem.
cd scripts
python labeler_agreement.py
generates the following plots
- script_plots/agreement_stats_coasttrain_naip_s2.png
- script_plots/agreement_stats_coasttrain_naip_s2_IOU.png
cd scripts
python plot_montage.py
Produces a montage of example imagery, labels, and overlay masks for each of the datasets, generating the following figures
- script_plots/example_coasttrain_naip.png
- script_plots/example_coasttrain_naip6class.png
- script_plots/example_coasttrain_quads.png
- script_plots/example_coasttrain_madeira.png
- script_plots/example_coasttrain_dauphin.png
- script_plots/example_coasttrain_sandwich.png
- script_plots/example_coasttrain_s2.png
- script_plots/example_coasttrain_s2_4class.png
- script_plots/example_coasttrain_l8.png
- script_plots/example_coasttrain_l8elwha.png
cd scripts
python plot_montage_remapped.py
Produces a montage of example imagery, labels, and overlay masks for each of the datasets remapped into 7 superclasses, generating the following figures
- script_plots/example_coasttrain_naip_remapped.png
- script_plots/example_coasttrain_naip6class_remapped.png
- script_plots/example_coasttrain_quads_remapped.png
- script_plots/example_coasttrain_madeira_remapped.png
- script_plots/example_coasttrain_dauphin_remapped.png
- script_plots/example_coasttrain_s2_remapped.png
- script_plots/example_coasttrain_s2_4class_remapped.png
- script_plots/example_coasttrain_l8elwha_remapped.png
- script_plots/example_coasttrain_l8_remapped.png
- script_plots/example_coasttrain_sandwich_remapped.png
You may use the Doodler utility gen_remapped_images_and_labels.py
with the provided config .json format files in the remap_config_files
folder. When prompted Select file containing super class names and class aliases
, select one of the provided config (json) files. For each of the npz files in the dataset that the config file describes, the program will create remapped label images with the suffix _remap_label.jpg
, as well as the images themselves and semi-transparent overlays showing the colorized mask.