Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ai-1477]-darwin-v2-export-fix #721

Merged
merged 76 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
f963e36
fixing test
ChristofferEdlund Oct 16, 2023
08fc262
added polygon and complex polygon tests with bounding boxes
ChristofferEdlund Oct 16, 2023
db3fbde
extended convertion tests
ChristofferEdlund Oct 16, 2023
5d98462
formatter
ChristofferEdlund Oct 17, 2023
69f33f0
updated test to reflect adding of bounding box
ChristofferEdlund Nov 14, 2023
b3c82e0
updated tests to check for new format
ChristofferEdlund Nov 14, 2023
594844a
added additional test for box and tag
ChristofferEdlund Nov 14, 2023
5ff60ae
removed prints
ChristofferEdlund Nov 14, 2023
8ded8db
black reformat
ChristofferEdlund Nov 14, 2023
f45ce38
black format
ChristofferEdlund Nov 14, 2023
828653b
removed an import
ChristofferEdlund Nov 14, 2023
054e4bf
added schema ref
ChristofferEdlund Nov 14, 2023
ef5f7b8
reformated utils
ChristofferEdlund Nov 14, 2023
cfce1bc
added support RemoteDatasetV1 parsing and updated tests
ChristofferEdlund Nov 14, 2023
4de5d9e
black fomrat
ChristofferEdlund Nov 14, 2023
37fb419
additional black magic
ChristofferEdlund Nov 14, 2023
15572bc
merge from master
ChristofferEdlund Nov 15, 2023
a74b006
merge from master
ChristofferEdlund Nov 15, 2023
0914e60
fixed conflic
ChristofferEdlund Nov 15, 2023
3d62044
minor fixes
ChristofferEdlund Nov 15, 2023
ab60ec6
removed ignore of empty files
ChristofferEdlund Nov 15, 2023
b178a6c
updated tests for new (old) behaviour
ChristofferEdlund Nov 15, 2023
c21ab07
added test case
ChristofferEdlund Nov 15, 2023
2991bc8
updated test
ChristofferEdlund Nov 16, 2023
f712ce7
updated code to work with bounding boxes and polygon, also added togg…
ChristofferEdlund Nov 17, 2023
b090f61
black
ChristofferEdlund Nov 17, 2023
2af57b4
adjusting paths to pass tests
ChristofferEdlund Nov 17, 2023
2d9f9d2
black
ChristofferEdlund Nov 17, 2023
20ae224
handling polyg and bbox bounding box anno
ChristofferEdlund Nov 17, 2023
aa65a29
[PY-401][external] Restore Meta Item & ItemQuery functions (#718)
JBWilkie Nov 16, 2023
ab4627a
[PY-402][external] Archive Meta Item & ItemQuery functions (#719)
JBWilkie Nov 16, 2023
96227e5
[PY-403][external] Set Priority Meta Item & ItemQuery functions (#720)
JBWilkie Nov 16, 2023
859de11
automatic ruff --fix changes (#723)
Nathanjp91 Nov 16, 2023
ce7f4d3
minor updates to tests
ChristofferEdlund Nov 17, 2023
c66ceb8
added make polygon tests for darwin_v2 format
ChristofferEdlund Nov 17, 2023
691eed7
updating tests to accomidate darwin V2 format
ChristofferEdlund Nov 17, 2023
7106dc5
black
ChristofferEdlund Nov 17, 2023
179da54
added nifty V2 test
ChristofferEdlund Nov 17, 2023
4c30f17
Merge remote-tracking branch 'origin' into ai-1410-darwin-v2-export-fix
ChristofferEdlund Nov 17, 2023
d0b33e5
black
ChristofferEdlund Nov 17, 2023
09b105e
refactor
ChristofferEdlund Nov 20, 2023
5b1ddc4
ruff --fix
ChristofferEdlund Nov 20, 2023
3a4785a
removed try catch in stacked targets
ChristofferEdlund Nov 20, 2023
ec61777
removed print
ChristofferEdlund Nov 20, 2023
36ca721
Reverted specific files to state in commit 09b105eed12f8d0a3d21beaf62…
ChristofferEdlund Nov 20, 2023
6130abe
converting complex and regular polygon to import format
ChristofferEdlund Nov 21, 2023
9bf55b9
added a potential e2e import fix
ChristofferEdlund Nov 21, 2023
6689d9f
removed debug prints
ChristofferEdlund Nov 21, 2023
d36f7d2
latest sync
ChristofferEdlund Nov 22, 2023
d0f7ca8
updated code to pass e2e tests
ChristofferEdlund Nov 22, 2023
4371111
black and ruff --fix all
ChristofferEdlund Nov 22, 2023
da8bf83
updated formatting
ChristofferEdlund Nov 22, 2023
3f4b740
minor changes to complex polygon
ChristofferEdlund Nov 22, 2023
28024bc
minor fix
ChristofferEdlund Nov 22, 2023
31a8314
black
ChristofferEdlund Nov 22, 2023
12239c8
minor updates based on comments
ChristofferEdlund Nov 22, 2023
c942a81
added PolygonPath and PolygonPaths definitions
ChristofferEdlund Nov 27, 2023
d75dcdd
merge
ChristofferEdlund Nov 27, 2023
3bac7e1
extended convertion tests
ChristofferEdlund Oct 16, 2023
05ed7c8
local changes
ChristofferEdlund Nov 27, 2023
41dd767
reverted changes to internal darwin format, convertion to v2 only don…
ChristofferEdlund Nov 27, 2023
f41a324
black
ChristofferEdlund Nov 27, 2023
dc2ac42
ruff and black
ChristofferEdlund Nov 27, 2023
a50f84e
merged from master
ChristofferEdlund Nov 27, 2023
7175c12
removed settings.json changes
ChristofferEdlund Nov 27, 2023
f06759c
reverting to non-v2 data.zip
ChristofferEdlund Nov 27, 2023
395b2c1
merged darwin_v1_test changes from origin master
ChristofferEdlund Nov 27, 2023
0153c78
changed json stream to normal json
ChristofferEdlund Nov 27, 2023
09b00d2
Fix `RecursionError` in `Item` class (#732)
saurbhc Nov 28, 2023
72bc6fc
commiting changes
ChristofferEdlund Nov 28, 2023
89a8a24
Merge remote-tracking branch 'origin' into ai-1410-darwin-v2-export-fix
ChristofferEdlund Nov 28, 2023
0e095c4
fixed json-stream error when checking for non-empty lists
ChristofferEdlund Nov 28, 2023
28c168c
remove unused import
ChristofferEdlund Nov 28, 2023
e991394
fixed video to image convertion bug when folders are used
ChristofferEdlund Nov 30, 2023
ec92bda
removed debug print
ChristofferEdlund Nov 30, 2023
ce7788c
removed debug print
ChristofferEdlund Nov 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,8 @@
"python.analysis.autoImportCompletions": true,
"python.testing.pytestEnabled": true,
"python.analysis.typeCheckingMode": "basic",
"python.testing.pytestArgs": [
"e2e_tests"
],
"python.testing.unittestEnabled": false,
}
6 changes: 6 additions & 0 deletions darwin/dataset/local_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ def __init__(
split: str = "default",
split_type: str = "random",
release_name: Optional[str] = None,
keep_empty_annotations: bool = False,
):
self.dataset_path = dataset_path
self.annotation_type = annotation_type
Expand All @@ -75,6 +76,7 @@ def __init__(
self.original_classes = None
self.original_images_path: Optional[List[Path]] = None
self.original_annotations_path: Optional[List[Path]] = None
self.keep_empty_annotations = keep_empty_annotations

release_path, annotations_dir, images_dir = self._initial_setup(
dataset_path, release_name
Expand All @@ -101,6 +103,7 @@ def __init__(
split,
partition,
split_type,
keep_empty_annotations,
)

if len(self.images_path) == 0:
Expand Down Expand Up @@ -130,12 +133,15 @@ def _setup_annotations_and_images(
split,
partition,
split_type,
keep_empty_annotations: bool = False,
):
# Find all the annotations and their corresponding images
for annotation_path in sorted(annotations_dir.glob("**/*.json")):
darwin_json = stream_darwin_json(annotation_path)
image_path = get_image_path_from_stream(darwin_json, images_dir)
if image_path.exists():
if not keep_empty_annotations and len(darwin_json["annotations"]) < 1:
continue
self.images_path.append(image_path)
self.annotations_path.append(annotation_path)
continue
Expand Down
7 changes: 6 additions & 1 deletion darwin/dataset/remote_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def split_video_annotations(self, release_name: str = "latest") -> None:

frame_annotations = split_video_annotation(darwin_annotation)
for frame_annotation in frame_annotations:
annotation = build_image_annotation(frame_annotation)
annotation = self._build_image_annotation(frame_annotation)

video_frame_annotations_path = annotations_path / annotation_file.stem
video_frame_annotations_path.mkdir(exist_ok=True, parents=True)
Expand Down Expand Up @@ -947,3 +947,8 @@ def local_images_path(self) -> Path:
def identifier(self) -> DatasetIdentifier:
"""The ``DatasetIdentifier`` of this ``RemoteDataset``."""
return DatasetIdentifier(team_slug=self.team, dataset_slug=self.slug)

def _build_image_annotation(
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
8 changes: 7 additions & 1 deletion darwin/dataset/remote_dataset_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@
UploadHandlerV1,
)
from darwin.dataset.utils import is_relative_to
from darwin.datatypes import ItemId, PathLike
from darwin.datatypes import AnnotationFile, ItemId, PathLike
from darwin.exceptions import NotFound, ValidationError
from darwin.exporter.formats.darwin_1_0 import build_image_annotation
from darwin.item import DatasetItem
from darwin.item_sorter import ItemSorter
from darwin.utils import find_files, urljoin
Expand Down Expand Up @@ -512,3 +513,8 @@ def import_annotation(self, item_id: ItemId, payload: Dict[str, Any]) -> None:
"""

self.client.import_annotation(item_id, payload=payload)

def _build_image_annotation(
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
8 changes: 7 additions & 1 deletion darwin/dataset/remote_dataset_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@
UploadHandlerV2,
)
from darwin.dataset.utils import is_relative_to
from darwin.datatypes import ItemId, PathLike
from darwin.datatypes import AnnotationFile, ItemId, PathLike
from darwin.exceptions import NotFound, UnknownExportVersion
from darwin.exporter.formats.darwin import build_image_annotation
from darwin.item import DatasetItem
from darwin.item_sorter import ItemSorter
from darwin.utils import find_files, urljoin
Expand Down Expand Up @@ -543,3 +544,8 @@ def _fetch_stages(self, stage_type):
workflow_id,
[stage for stage in workflow["stages"] if stage["type"] == stage_type],
)

def _build_image_annotation(
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
12 changes: 11 additions & 1 deletion darwin/datatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,7 @@ def make_polygon(
bounding_box: Optional[Dict] = None,
subs: Optional[List[SubAnnotation]] = None,
slot_names: Optional[List[str]] = None,
darwin_v1: bool = False,
) -> Annotation:
"""
Creates and returns a polygon annotation.
Expand Down Expand Up @@ -565,9 +566,18 @@ def make_polygon(
Annotation
A polygon ``Annotation``.
"""

if darwin_v1:
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
polygon_data = {"path": point_path}
else:
# Lets handle darwin V2 datasets
if not isinstance(point_path[0], list):
point_path = [point_path]
polygon_data = {"paths": point_path}

ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
return Annotation(
AnnotationClass(class_name, "polygon"),
_maybe_add_bounding_box_data({"path": point_path}, bounding_box),
_maybe_add_bounding_box_data(polygon_data, bounding_box),
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
subs or [],
slot_names=slot_names or [],
)
Expand Down
3 changes: 3 additions & 0 deletions darwin/exporter/formats/coco.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,7 @@ def _build_annotation(
categories: Dict[str, int],
) -> Optional[Dict[str, Any]]:
annotation_type = annotation.annotation_class.annotation_type

if annotation_type == "polygon":
sequences = convert_polygons_to_sequences(
annotation.data["path"], rounding=False
Expand Down Expand Up @@ -561,6 +562,7 @@ def _build_annotation(
return _build_annotation(
annotation_file,
annotation_id,
# TODO Update this to V2
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
dt.make_polygon(
annotation.annotation_class.name,
[
Expand All @@ -571,6 +573,7 @@ def _build_annotation(
],
None,
annotation.subs,
darwin_v1=True,
),
categories,
)
Expand Down
161 changes: 120 additions & 41 deletions darwin/exporter/formats/darwin.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,65 +16,144 @@

def build_image_annotation(annotation_file: dt.AnnotationFile) -> Dict[str, Any]:
"""
Builds and returns a dictionary with the annotations present in the given file.
Builds and returns a dictionary with the annotations present in the given file in Darwin v2 format.

Parameters
----------
annotation_file: dt.AnnotationFile
annotation_file: AnnotationFile
File with the image annotations to extract.
For schema, see: https://darwin-public.s3.eu-west-1.amazonaws.com/darwin_json/2.0/schema.json

Returns
-------
Dict[str, Any]
A dictionary with the annotation from the given file. Has the following structure:

.. code-block:: python

{
"annotations": [
{
"annotation_type": { ... }, # annotation_data
"name": "annotation class name",
"bounding_box": { ... } # Optional parameter, only present if the file has a bounding box as well
}
],
"image": {
"filename": "a_file_name.json",
"height": 1000,
"width": 2000,
"url": "https://www.darwin.v7labs.com/..."
}
}
A dictionary with the annotations in Darwin v2 format.
"""
annotations: List[Dict[str, Any]] = []
print(annotations)
annotations_list: List[Dict[str, Any]] = []

for annotation in annotation_file.annotations:
payload = {
annotation.annotation_class.annotation_type: _build_annotation_data(
annotation
),
"name": annotation.annotation_class.name,
}
annotation_data = _build_v2_annotation_data(annotation)
annotations_list.append(annotation_data)

slots_data = _build_slots_data(annotation_file.slots)
item = _build_item_data(annotation_file)
item["slots"] = slots_data

return {
"version": "2.0",
"schema_ref": "https://darwin-public.s3.eu-west-1.amazonaws.com/darwin_json/2.0/schema.json",
"item": item,
"annotations": annotations_list,
}


if (
annotation.annotation_class.annotation_type == "complex_polygon"
or annotation.annotation_class.annotation_type == "polygon"
) and "bounding_box" in annotation.data:
payload["bounding_box"] = annotation.data["bounding_box"]
def _build_v2_annotation_data(annotation: dt.Annotation) -> Dict[str, Any]:
annotation_data = {"id": annotation.id, "name": annotation.annotation_class.name}
if annotation.annotation_class.annotation_type == "bounding_box":
annotation_data["bounding_box"] = _build_bounding_box_data(annotation.data)
elif annotation.annotation_class.annotation_type == "tag":
annotation_data["tag"] = {}
elif annotation.annotation_class.annotation_type == "polygon":
polygon_data = _build_polygon_data(annotation.data)
annotation_data["polygon"] = polygon_data
annotation_data["bounding_box"] = _build_bounding_box_data(annotation.data)
return annotation_data

annotations.append(payload)

def _build_bounding_box_data(data: Dict[str, Any]) -> Dict[str, Any]:
if "bounding_box" in data:
data = data["bounding_box"]
return {
"annotations": annotations,
"image": {
"filename": annotation_file.filename,
"height": annotation_file.image_height,
"width": annotation_file.image_width,
"url": annotation_file.image_url,
"h": data.get("h"),
"w": data.get("w"),
"x": data.get("x"),
"y": data.get("y"),
}


def _build_polygon_data(
data: Dict[str, Any]
) -> Dict[str, List[List[Dict[str, float]]]]:
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
"""
Builds the polygon data for Darwin v2 format.

Parameters
----------
data : Dict[str, Any]
The original data for the polygon annotation.

Returns
-------
Dict[str, List[List[Dict[str, float]]]]
The polygon data in the format required for Darwin v2 annotations.
"""

return {"paths": data.get("paths", [])}


def _build_item_data(annotation_file: dt.AnnotationFile) -> Dict[str, Any]:
"""
Constructs the 'item' section of the Darwin v2 format annotation.

Parameters
----------
annotation_file: dt.AnnotationFile
The AnnotationFile object containing annotation data.

Returns
-------
Dict[str, Any]
The 'item' section of the Darwin v2 format annotation.
"""
return {
"name": annotation_file.filename,
"path": annotation_file.remote_path or "/",
"source_info": {
"dataset": {
"name": annotation_file.dataset_name,
"slug": annotation_file.dataset_name.lower().replace(" ", "-")
if annotation_file.dataset_name
else None,
},
"item_id": annotation_file.item_id,
"team": {
"name": None, # TODO Replace with actual team name
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
"slug": None, # TODO Replace with actual team slug
},
"workview_url": annotation_file.workview_url,
},
}


def _build_slots_data(slots: List[dt.Slot]) -> List[Dict[str, Any]]:
"""
Constructs the 'slots' data for the Darwin v2 format annotation.

Parameters
----------
slots: List[Slot]
A list of Slot objects from the AnnotationFile.

Returns
-------
List[Dict[str, Any]]
The 'slots' data for the Darwin v2 format annotation.
"""
slots_data = []
for slot in slots:
slot_data = {
"type": slot.type,
"slot_name": slot.name,
"width": slot.width,
"height": slot.height,
"thumbnail_url": slot.thumbnail_url,
"source_files": slot.source_files,
}
slots_data.append(slot_data)

return slots_data


@deprecation.deprecated(
deprecated_in="0.7.8",
removed_in="0.8.0",
Expand Down
Loading
Loading