-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AI-1260][internal] add loading of polygon support for object detection datasets #679
Merged
ChristofferEdlund
merged 35 commits into
master
from
ai-1260-add-loading-of-polygon-support-for-object-detection-datasets
Oct 20, 2023
Merged
Changes from 31 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
7ff1132
added albumentations transform test
ChristofferEdlund 88a2750
updated poetry file
ChristofferEdlund 520f9f3
added albumentations to poetry.lock
ChristofferEdlund 38fb230
added manual install of albumentations
ChristofferEdlund bc5931c
Merge remote-tracking branch 'origin/master' into ai-1260-add-loading…
ChristofferEdlund 9e83781
added support to load both polygon and bounding-box annotations for o…
ChristofferEdlund 264fe33
commit
ChristofferEdlund 31dc64c
removed test that will be introduced in another pr
ChristofferEdlund 094cc70
added a check for duplicate classes (from polygon and bounding_boxes
ChristofferEdlund 65d43eb
removed code that is not supposed to be in github workflow
ChristofferEdlund a3dfca9
updated stratified to support bounding_box + polygon
ChristofferEdlund 99cb219
removed some printing
ChristofferEdlund 752b54f
changes based on owen's feedback
ChristofferEdlund 09ba55b
minor update
ChristofferEdlund 4341ca0
Merge remote-tracking branch 'origin/master' into ai-1260-add-loading…
ChristofferEdlund 199a71d
black formatting
ChristofferEdlund 56788cf
reverted classes functionality to old one, but added the ability to l…
ChristofferEdlund 3855279
linter check
ChristofferEdlund ba78fe1
poetry lock fix
ChristofferEdlund 081f249
manually fixed some ruff issues
ChristofferEdlund c5f7286
ignoring ruff import * issues in dataset_test.py
ChristofferEdlund 145ce20
refactored local_dataset class to appease ruff (to long init)
ChristofferEdlund 99c4186
added test to extract_classes with multiple annotation types selected
ChristofferEdlund 67dd274
added stratefied split logic to add polygons to bounding_box stratife…
ChristofferEdlund d128a18
merged from master
ChristofferEdlund 7e1f194
BLACK
ChristofferEdlund 94da955
Merge remote-tracking branch 'origin/master' into ai-1260-add-loading…
ChristofferEdlund 04de9c5
revrting to old init
ChristofferEdlund 57797ca
revrting to old init
ChristofferEdlund a4431f8
made the refactor more like the original
ChristofferEdlund 0ce35b3
added black
ChristofferEdlund f2bee69
fixed minor issue
ChristofferEdlund 6aab1ec
removed hard val- and test- set requirements
ChristofferEdlund 0f799a5
is exhaust generator code present now?
ChristofferEdlund 2273fa2
no longer forcing users to have a training split
ChristofferEdlund File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,6 @@ | |
from typing import Any, Dict, Iterator, List, Optional, Tuple | ||
|
||
import numpy as np | ||
import orjson as json | ||
from PIL import Image as PILImage | ||
|
||
from darwin.dataset.utils import get_classes, get_release_path, load_pil_image | ||
|
@@ -64,20 +63,6 @@ def __init__( | |
split_type: str = "random", | ||
release_name: Optional[str] = None, | ||
): | ||
assert dataset_path is not None | ||
release_path = get_release_path(dataset_path, release_name) | ||
annotations_dir = release_path / "annotations" | ||
assert annotations_dir.exists() | ||
images_dir = dataset_path / "images" | ||
assert images_dir.exists() | ||
|
||
if partition not in ["train", "val", "test", None]: | ||
raise ValueError("partition should be either 'train', 'val', or 'test'") | ||
if split_type not in ["random", "stratified"]: | ||
raise ValueError("split_type should be either 'random', 'stratified'") | ||
if annotation_type not in ["tag", "polygon", "bounding_box"]: | ||
raise ValueError("annotation_type should be either 'tag', 'bounding_box', or 'polygon'") | ||
|
||
self.dataset_path = dataset_path | ||
self.annotation_type = annotation_type | ||
self.images_path: List[Path] = [] | ||
|
@@ -86,15 +71,64 @@ def __init__( | |
self.original_images_path: Optional[List[Path]] = None | ||
self.original_annotations_path: Optional[List[Path]] = None | ||
|
||
release_path, annotations_dir, images_dir = self._initial_setup( | ||
dataset_path, release_name | ||
) | ||
self._validate_inputs(partition, split_type, annotation_type) | ||
# Get the list of classes | ||
|
||
annotation_types = [self.annotation_type] | ||
# We fetch bounding_boxes annotations from selected polygons as well | ||
if self.annotation_type == "bounding_boxes": | ||
annotation_types.append("polygon") | ||
self.classes = get_classes( | ||
self.dataset_path, release_name, annotation_type=self.annotation_type, remove_background=True | ||
self.dataset_path, | ||
release_name, | ||
annotation_type=annotation_types, | ||
remove_background=True, | ||
) | ||
self.num_classes = len(self.classes) | ||
self._setup_annotations_and_images( | ||
release_path, | ||
annotations_dir, | ||
images_dir, | ||
annotation_type, | ||
split, | ||
partition, | ||
split_type, | ||
) | ||
|
||
if len(self.images_path) == 0: | ||
raise ValueError( | ||
f"Could not find any {SUPPORTED_IMAGE_EXTENSIONS} file", | ||
f" in {images_dir}", | ||
) | ||
|
||
assert len(self.images_path) == len(self.annotations_path) | ||
|
||
stems = build_stems(release_path, annotations_dir, annotation_type, split, partition, split_type) | ||
def _validate_inputs(self, partition, split_type, annotation_type): | ||
if partition not in ["train", "val", "test", None]: | ||
raise ValueError("partition should be either 'train', 'val', or 'test'") | ||
if split_type not in ["random", "stratified"]: | ||
raise ValueError("split_type should be either 'random', 'stratified'") | ||
if annotation_type not in ["tag", "polygon", "bounding_box"]: | ||
raise ValueError( | ||
"annotation_type should be either 'tag', 'bounding_box', or 'polygon'" | ||
) | ||
|
||
# Find all the annotations and their corresponding images | ||
def _setup_annotations_and_images( | ||
self, | ||
release_path, | ||
annotations_dir, | ||
images_dir, | ||
annotation_type, | ||
split, | ||
partition, | ||
split_type, | ||
): | ||
stems = build_stems( | ||
release_path, annotations_dir, annotation_type, split, partition, split_type | ||
) | ||
for stem in stems: | ||
annotation_path = annotations_dir / f"{stem}.json" | ||
images = [] | ||
|
@@ -107,16 +141,24 @@ def __init__( | |
if image_path.exists(): | ||
images.append(image_path) | ||
if len(images) < 1: | ||
raise ValueError(f"Annotation ({annotation_path}) does not have a corresponding image") | ||
raise ValueError( | ||
f"Annotation ({annotation_path}) does not have a corresponding image" | ||
) | ||
if len(images) > 1: | ||
raise ValueError(f"Image ({stem}) is present with multiple extensions. This is forbidden.") | ||
raise ValueError( | ||
f"Image ({stem}) is present with multiple extensions. This is forbidden." | ||
) | ||
self.images_path.append(images[0]) | ||
self.annotations_path.append(annotation_path) | ||
|
||
if len(self.images_path) == 0: | ||
raise ValueError(f"Could not find any {SUPPORTED_IMAGE_EXTENSIONS} file", f" in {images_dir}") | ||
|
||
assert len(self.images_path) == len(self.annotations_path) | ||
def _initial_setup(self, dataset_path, release_name): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the extraction here, even if it is only to make Ruff happy! |
||
assert dataset_path is not None | ||
release_path = get_release_path(dataset_path, release_name) | ||
annotations_dir = release_path / "annotations" | ||
assert annotations_dir.exists() | ||
images_dir = dataset_path / "images" | ||
assert images_dir.exists() | ||
return release_path, annotations_dir, images_dir | ||
|
||
def get_img_info(self, index: int) -> Dict[str, Any]: | ||
""" | ||
|
@@ -166,7 +208,9 @@ def get_height_and_width(self, index: int) -> Tuple[float, float]: | |
parsed = parse_darwin_json(self.annotations_path[index], index) | ||
return parsed.image_height, parsed.image_width | ||
|
||
def extend(self, dataset: "LocalDataset", extend_classes: bool = False) -> "LocalDataset": | ||
def extend( | ||
self, dataset: "LocalDataset", extend_classes: bool = False | ||
) -> "LocalDataset": | ||
""" | ||
Extends the current dataset with another one. | ||
|
||
|
@@ -261,7 +305,10 @@ def parse_json(self, index: int) -> Dict[str, Any]: | |
# Filter out unused classes and annotations of a different type | ||
if self.classes is not None: | ||
annotations = [ | ||
a for a in annotations if a.annotation_class.name in self.classes and self.annotation_type_supported(a) | ||
a | ||
for a in annotations | ||
if a.annotation_class.name in self.classes | ||
and self.annotation_type_supported(a) | ||
] | ||
return { | ||
"image_id": index, | ||
|
@@ -278,15 +325,20 @@ def annotation_type_supported(self, annotation) -> bool: | |
elif self.annotation_type == "bounding_box": | ||
is_bounding_box = annotation_type == "bounding_box" | ||
is_supported_polygon = ( | ||
annotation_type in ["polygon", "complex_polygon"] and "bounding_box" in annotation.data | ||
annotation_type in ["polygon", "complex_polygon"] | ||
and "bounding_box" in annotation.data | ||
) | ||
return is_bounding_box or is_supported_polygon | ||
elif self.annotation_type == "polygon": | ||
return annotation_type in ["polygon", "complex_polygon"] | ||
else: | ||
raise ValueError("annotation_type should be either 'tag', 'bounding_box', or 'polygon'") | ||
raise ValueError( | ||
"annotation_type should be either 'tag', 'bounding_box', or 'polygon'" | ||
) | ||
|
||
def measure_mean_std(self, multi_threaded: bool = True) -> Tuple[np.ndarray, np.ndarray]: | ||
def measure_mean_std( | ||
self, multi_threaded: bool = True | ||
) -> Tuple[np.ndarray, np.ndarray]: | ||
""" | ||
Computes mean and std of trained images, given the train loader. | ||
|
||
|
@@ -309,7 +361,9 @@ def measure_mean_std(self, multi_threaded: bool = True) -> Tuple[np.ndarray, np. | |
results = pool.map(self._return_mean, self.images_path) | ||
mean = np.sum(np.array(results), axis=0) / len(self.images_path) | ||
# Online image_classification deviation | ||
results = pool.starmap(self._return_std, [[item, mean] for item in self.images_path]) | ||
results = pool.starmap( | ||
self._return_std, [[item, mean] for item in self.images_path] | ||
) | ||
std_sum = np.sum(np.array([item[0] for item in results]), axis=0) | ||
total_pixel_count = np.sum(np.array([item[1] for item in results])) | ||
std = np.sqrt(std_sum / total_pixel_count) | ||
|
@@ -355,14 +409,20 @@ def _compute_weights(labels: List[int]) -> np.ndarray: | |
@staticmethod | ||
def _return_mean(image_path: Path) -> np.ndarray: | ||
img = np.array(load_pil_image(image_path)) | ||
mean = np.array([np.mean(img[:, :, 0]), np.mean(img[:, :, 1]), np.mean(img[:, :, 2])]) | ||
mean = np.array( | ||
[np.mean(img[:, :, 0]), np.mean(img[:, :, 1]), np.mean(img[:, :, 2])] | ||
) | ||
return mean / 255.0 | ||
|
||
# Loads an image with OpenCV and returns the channel wise std of the image. | ||
@staticmethod | ||
def _return_std(image_path: Path, mean: np.ndarray) -> Tuple[np.ndarray, float]: | ||
img = np.array(load_pil_image(image_path)) / 255.0 | ||
m2 = np.square(np.array([img[:, :, 0] - mean[0], img[:, :, 1] - mean[1], img[:, :, 2] - mean[2]])) | ||
m2 = np.square( | ||
np.array( | ||
[img[:, :, 0] - mean[0], img[:, :, 1] - mean[1], img[:, :, 2] - mean[2]] | ||
) | ||
) | ||
return np.sum(np.sum(m2, axis=1), 1), m2.size / 3.0 | ||
|
||
def __getitem__(self, index: int): | ||
|
@@ -432,7 +492,10 @@ def build_stems( | |
""" | ||
|
||
if partition is None: | ||
return (str(e.relative_to(annotations_dir).parent / e.stem) for e in sorted(annotations_dir.glob("**/*.json"))) | ||
return ( | ||
str(e.relative_to(annotations_dir).parent / e.stem) | ||
for e in sorted(annotations_dir.glob("**/*.json")) | ||
) | ||
|
||
if split_type == "random": | ||
split_filename = f"{split_type}_{partition}.txt" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring the init function to make ruff happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactoring - thanks ruff!