This repository provides a comprehensive workflow for augmenting and randomizing both images and their corresponding labels (annotations generated using labelImg). The aim is to prepare a well-structured dataset for training with the Tensorflow Object Detection API.
Create the following organized directory structure:
.
└───annotations
| └───xml
└───data
│ └───split
└───images
| └───train
| └───test
└───sample_images
└───sample_labels
│ AnnotationAugmentation.py
│ Augmentation.py
│ Labeling.py
│ Randomize.py
│ Readme.md
│ Renaming.py
│ TrainTestSplit.py
│ XMLtoCSV.py
-
Prepare Sample Images and Labels:
- Place raw images in the
sample_images
folder. - Utilize labelImg to label the sample images and save annotations in the
sample_labels
folder.
- Place raw images in the
-
Image Augmentation:
- Apply image augmentations using the script
Augmentation.py
.
- Apply image augmentations using the script
-
Label Augmentation:
- Apply label augmentations using the script
AnnotationAugmentation.py
.
- Apply label augmentations using the script
-
Consolidation:
- Copy the contents of the
sample_images
folder to theimages
folder. - Copy the contents of the
sample_labels
folder to theannotations/xml
folder.
- Copy the contents of the
-
Randomization:
- Shuffle the images and labels within the
images
andannotations/xml
folders usingRandomize.py
.
- Shuffle the images and labels within the
-
Update Annotations:
- Update XML files' contents with updated image and label directories by running
Labeling.py
.
- Update XML files' contents with updated image and label directories by running
-
XML to CSV Conversion:
- Convert all XML files to a single CSV using
XMLtoCSV.py
.
- Convert all XML files to a single CSV using
-
Train-Test Split:
- Create
train
andtest
subfolders within theimages
directory. - Split the dataset into train and test images using
TrainTestSplit.py
.
- Create
- The
images
directory is organized intotrain
andtest
subdirectories, each containing their respective images for training and testing. - The
data
directory holds asplit
subdirectory with two CSV files for labels corresponding to train and test images. - All images and labels are sequentially named and their directories are updated as part of the workflow.
By following this structured workflow, you can efficiently augment, randomize, and prepare your dataset for optimal utilization with the Tensorflow Object Detection API.