Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI-1260][internal] add loading of polygon support for object detection datasets #679

Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
7ff1132
added albumentations transform test
ChristofferEdlund Sep 29, 2023
88a2750
updated poetry file
ChristofferEdlund Sep 29, 2023
520f9f3
added albumentations to poetry.lock
ChristofferEdlund Sep 29, 2023
38fb230
added manual install of albumentations
ChristofferEdlund Sep 29, 2023
bc5931c
Merge remote-tracking branch 'origin/master' into ai-1260-add-loading…
ChristofferEdlund Oct 9, 2023
9e83781
added support to load both polygon and bounding-box annotations for o…
ChristofferEdlund Oct 9, 2023
264fe33
commit
ChristofferEdlund Oct 9, 2023
31dc64c
removed test that will be introduced in another pr
ChristofferEdlund Oct 9, 2023
094cc70
added a check for duplicate classes (from polygon and bounding_boxes
ChristofferEdlund Oct 9, 2023
65d43eb
removed code that is not supposed to be in github workflow
ChristofferEdlund Oct 9, 2023
a3dfca9
updated stratified to support bounding_box + polygon
ChristofferEdlund Oct 10, 2023
99cb219
removed some printing
ChristofferEdlund Oct 11, 2023
752b54f
changes based on owen's feedback
ChristofferEdlund Oct 13, 2023
09ba55b
minor update
ChristofferEdlund Oct 17, 2023
4341ca0
Merge remote-tracking branch 'origin/master' into ai-1260-add-loading…
ChristofferEdlund Oct 17, 2023
199a71d
black formatting
ChristofferEdlund Oct 17, 2023
56788cf
reverted classes functionality to old one, but added the ability to l…
ChristofferEdlund Oct 17, 2023
3855279
linter check
ChristofferEdlund Oct 17, 2023
ba78fe1
poetry lock fix
ChristofferEdlund Oct 17, 2023
081f249
manually fixed some ruff issues
ChristofferEdlund Oct 17, 2023
c5f7286
ignoring ruff import * issues in dataset_test.py
ChristofferEdlund Oct 17, 2023
145ce20
refactored local_dataset class to appease ruff (to long init)
ChristofferEdlund Oct 17, 2023
99c4186
added test to extract_classes with multiple annotation types selected
ChristofferEdlund Oct 17, 2023
67dd274
added stratefied split logic to add polygons to bounding_box stratife…
ChristofferEdlund Oct 17, 2023
d128a18
merged from master
ChristofferEdlund Oct 17, 2023
7e1f194
BLACK
ChristofferEdlund Oct 17, 2023
94da955
Merge remote-tracking branch 'origin/master' into ai-1260-add-loading…
ChristofferEdlund Oct 17, 2023
04de9c5
revrting to old init
ChristofferEdlund Oct 17, 2023
57797ca
revrting to old init
ChristofferEdlund Oct 17, 2023
a4431f8
made the refactor more like the original
ChristofferEdlund Oct 17, 2023
0ce35b3
added black
ChristofferEdlund Oct 17, 2023
f2bee69
fixed minor issue
ChristofferEdlund Oct 17, 2023
6aab1ec
removed hard val- and test- set requirements
ChristofferEdlund Oct 18, 2023
0f799a5
is exhaust generator code present now?
ChristofferEdlund Oct 18, 2023
2273fa2
no longer forcing users to have a training split
ChristofferEdlund Oct 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 34 additions & 7 deletions darwin/dataset/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,19 @@ def make_class_lists(release_path: Path) -> None:
f.write("\n".join(classes_names))


def get_classes_from_file(path):
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
"""Helper function to read class names from a file."""
if path.exists():
return path.read_text().splitlines()
return []


def available_annotation_types(release_path):
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
"""Returns a list of available annotation types based on the existing files."""
files = [p.name for p in release_path.glob("lists/classes_*.txt")]
return [f[len("classes_") : -len(".txt")] for f in files]


def get_classes(
dataset_path: PathLike,
release_name: Optional[str] = None,
Expand All @@ -147,7 +160,7 @@ def get_classes(
release_name : Optional[str], default: None
Version of the dataset.
annotation_type : str, default: "polygon"
The type of annotation classes [tag, polygon].
The type of annotation classes [tag, polygon, bounding_box].
remove_background : bool, default: True
Removes the background class (if exists) from the list of classes.

Expand All @@ -160,10 +173,26 @@ def get_classes(
dataset_path = Path(dataset_path)
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
release_path = get_release_path(dataset_path, release_name)

classes_path = release_path / f"lists/classes_{annotation_type}.txt"
classes = classes_path.read_text().splitlines()
if remove_background and classes[0] == "__background__":
# If annotation_type is a string and is 'bounding_box', also consider polygons
if isinstance(annotation_type, str):
if annotation_type == "bounding_box":
annotation_types_to_load = [annotation_type, "polygon"]
else:
annotation_types_to_load = [annotation_type]

classes = [] # Use a list to maintain order
for atype in annotation_types_to_load:
classes_file_path = release_path / f"lists/classes_{atype}.txt"
for cls in get_classes_from_file(classes_file_path):
if cls not in classes: # Only add if it's not already in the list
classes.append(cls)

if remove_background and classes and classes[0] == "__background__":
classes = classes[1:]

available_types = available_annotation_types(release_path)
assert len(classes) > 0, f"No classes found for {annotation_type}. Supported types are: {', '.join(available_types)}"

return classes


Expand Down Expand Up @@ -627,9 +656,7 @@ def compute_distributions(
if annotation_file is None:
continue

annotation_class_names: List[str] = [
annotation.annotation_class.name for annotation in annotation_file.annotations
]
annotation_class_names: List[str] = [annotation.annotation_class.name for annotation in annotation_file.annotations]

class_distribution[partition] += Counter(set(annotation_class_names))
instance_distribution[partition] += Counter(annotation_class_names)
Expand Down
Loading