Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI-1260][internal] add loading of polygon support for object detection datasets #679

Conversation

ChristofferEdlund
Copy link
Contributor

@ChristofferEdlund ChristofferEdlund commented Oct 9, 2023

Problem

In the initial implementation, when loading classes for datasets labeled with the bounding_box annotation type, it wasn't taking into consideration that the polygon annotations could also be relevant because they contain bounding box information in their annotations.

Solution

We modified the load_classes function to incorporate the following logic:

  1. When the annotation_type is specified as bounding_box, the function now additionally considers polygon annotations.
  2. It attempts to load class labels from both classes_bounding_box.txt and classes_polygon.txt files, combining the two lists if both exist.
  3. The changes ensure a more comprehensive loading of classes that may be relevant to object detection tasks.

Aka, we now load BOTH bounding_box and polygon classes that has been exported per default.

Changelog

  • Enhanced the load_classes function to load polygon class labels when the bounding_box annotation type is specified.
  • Ensured a combined list of classes from both bounding boxes and polygons for object detection datasets.

@linear
Copy link

linear bot commented Oct 9, 2023

AI-1260 Add loading of polygon support for object-detection datasets in darwin-py

The polygon datasets has bounding box annotations but those can not be loaded with obejct-detection dataloades currently in darwin-py.

Lets add the feature to load BOTH object-detection and instance-segmentation annoations with the object-detection dataloader.

Copy link
Contributor

@owencjones owencjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes, and one query

darwin/dataset/utils.py Outdated Show resolved Hide resolved
darwin/dataset/utils.py Outdated Show resolved Hide resolved
darwin/dataset/utils.py Show resolved Hide resolved
tests/darwin/torch/dataset_test.py Outdated Show resolved Hide resolved
@owencjones owencjones changed the title [Ai-1260] add loading of polygon support for object detection datasets [AI-1260][internal] add loading of polygon support for object detection datasets Oct 11, 2023
@ChristofferEdlund ChristofferEdlund force-pushed the ai-1260-add-loading-of-polygon-support-for-object-detection-datasets branch from 017b1c0 to 99cb219 Compare October 13, 2023 12:55
@@ -64,20 +63,6 @@ def __init__(
split_type: str = "random",
release_name: Optional[str] = None,
):
assert dataset_path is not None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring the init function to make ruff happy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactoring - thanks ruff!

Copy link
Contributor Author

@ChristofferEdlund ChristofferEdlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major changes is that extract_classes and get_classes has the same default behavior.

FThe logic of loading polygon annotations with bounding_box are placed a level higher in stratified sampling and local_dataset. This is done by enabling loading of a list of annotation_types.

Copy link
Contributor

@owencjones owencjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sound to me, I think QA needs to be solid, and Jon should ideally have a glance, because this changes code I'm not as familiar with, that he probably wrote :)

raise ValueError(f"Could not find any {SUPPORTED_IMAGE_EXTENSIONS} file", f" in {images_dir}")

assert len(self.images_path) == len(self.annotations_path)
def _initial_setup(self, dataset_path, release_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the extraction here, even if it is only to make Ruff happy!

@@ -238,102 +285,97 @@ def get_coco_format_record(
image_id: Optional[Union[str, int]] = None,
classes: Optional[List[str]] = None,
) -> Dict[str, Any]:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason behind removing the docblock? Internal function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange, its there in my code. Will see if I can get it back on the github as well.

@@ -93,7 +100,10 @@ def extract_classes(annotations_path: Path, annotation_type: str) -> Tuple[Dict[
continue

for annotation in annotation_file.annotations:
if annotation.annotation_class.annotation_type != annotation_type:
if (
Copy link
Contributor

@dorfmanrobert dorfmanrobert Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this may be inconsistent with this logic. Here it seems "complex_polygon" will not be considered in creating the class list (based on how this is called in make_class_lists), while in the linked logic "complex_polygon" is considered for an annotation.annotation_class.annotation_type.

Copy link
Contributor

@dorfmanrobert dorfmanrobert Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which is better: remove complex_polygon from linked logic or add it in make_class_lists

Unless this is an intentional difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great input, I have not changed anything considering that logic. So the behavior should be the same as main darwin-py in that regard. BUT, that does not stop us from fixing it if there is an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest, we skip this for now. It might be a bug, but it would be a pre-existing condition and we have the same behavior as before. Besides, I think that complex-polygons are getting deprecated.

We could raise a bug report thought.

Copy link
Contributor

@almazan almazan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Disclamer: really long PR to review, so I might have missed things. Haven't tested it myself.

@@ -64,20 +63,6 @@ def __init__(
split_type: str = "random",
release_name: Optional[str] = None,
):
assert dataset_path is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactoring - thanks ruff!

darwin/dataset/split_manager.py Outdated Show resolved Hide resolved
@ChristofferEdlund ChristofferEdlund merged commit 5b7ad8b into master Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants