Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ai-1477]-darwin-v2-export-fix #721

Merged
merged 76 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
f963e36
fixing test
ChristofferEdlund Oct 16, 2023
08fc262
added polygon and complex polygon tests with bounding boxes
ChristofferEdlund Oct 16, 2023
db3fbde
extended convertion tests
ChristofferEdlund Oct 16, 2023
5d98462
formatter
ChristofferEdlund Oct 17, 2023
69f33f0
updated test to reflect adding of bounding box
ChristofferEdlund Nov 14, 2023
b3c82e0
updated tests to check for new format
ChristofferEdlund Nov 14, 2023
594844a
added additional test for box and tag
ChristofferEdlund Nov 14, 2023
5ff60ae
removed prints
ChristofferEdlund Nov 14, 2023
8ded8db
black reformat
ChristofferEdlund Nov 14, 2023
f45ce38
black format
ChristofferEdlund Nov 14, 2023
828653b
removed an import
ChristofferEdlund Nov 14, 2023
054e4bf
added schema ref
ChristofferEdlund Nov 14, 2023
ef5f7b8
reformated utils
ChristofferEdlund Nov 14, 2023
cfce1bc
added support RemoteDatasetV1 parsing and updated tests
ChristofferEdlund Nov 14, 2023
4de5d9e
black fomrat
ChristofferEdlund Nov 14, 2023
37fb419
additional black magic
ChristofferEdlund Nov 14, 2023
15572bc
merge from master
ChristofferEdlund Nov 15, 2023
a74b006
merge from master
ChristofferEdlund Nov 15, 2023
0914e60
fixed conflic
ChristofferEdlund Nov 15, 2023
3d62044
minor fixes
ChristofferEdlund Nov 15, 2023
ab60ec6
removed ignore of empty files
ChristofferEdlund Nov 15, 2023
b178a6c
updated tests for new (old) behaviour
ChristofferEdlund Nov 15, 2023
c21ab07
added test case
ChristofferEdlund Nov 15, 2023
2991bc8
updated test
ChristofferEdlund Nov 16, 2023
f712ce7
updated code to work with bounding boxes and polygon, also added togg…
ChristofferEdlund Nov 17, 2023
b090f61
black
ChristofferEdlund Nov 17, 2023
2af57b4
adjusting paths to pass tests
ChristofferEdlund Nov 17, 2023
2d9f9d2
black
ChristofferEdlund Nov 17, 2023
20ae224
handling polyg and bbox bounding box anno
ChristofferEdlund Nov 17, 2023
aa65a29
[PY-401][external] Restore Meta Item & ItemQuery functions (#718)
JBWilkie Nov 16, 2023
ab4627a
[PY-402][external] Archive Meta Item & ItemQuery functions (#719)
JBWilkie Nov 16, 2023
96227e5
[PY-403][external] Set Priority Meta Item & ItemQuery functions (#720)
JBWilkie Nov 16, 2023
859de11
automatic ruff --fix changes (#723)
Nathanjp91 Nov 16, 2023
ce7f4d3
minor updates to tests
ChristofferEdlund Nov 17, 2023
c66ceb8
added make polygon tests for darwin_v2 format
ChristofferEdlund Nov 17, 2023
691eed7
updating tests to accomidate darwin V2 format
ChristofferEdlund Nov 17, 2023
7106dc5
black
ChristofferEdlund Nov 17, 2023
179da54
added nifty V2 test
ChristofferEdlund Nov 17, 2023
4c30f17
Merge remote-tracking branch 'origin' into ai-1410-darwin-v2-export-fix
ChristofferEdlund Nov 17, 2023
d0b33e5
black
ChristofferEdlund Nov 17, 2023
09b105e
refactor
ChristofferEdlund Nov 20, 2023
5b1ddc4
ruff --fix
ChristofferEdlund Nov 20, 2023
3a4785a
removed try catch in stacked targets
ChristofferEdlund Nov 20, 2023
ec61777
removed print
ChristofferEdlund Nov 20, 2023
36ca721
Reverted specific files to state in commit 09b105eed12f8d0a3d21beaf62…
ChristofferEdlund Nov 20, 2023
6130abe
converting complex and regular polygon to import format
ChristofferEdlund Nov 21, 2023
9bf55b9
added a potential e2e import fix
ChristofferEdlund Nov 21, 2023
6689d9f
removed debug prints
ChristofferEdlund Nov 21, 2023
d36f7d2
latest sync
ChristofferEdlund Nov 22, 2023
d0f7ca8
updated code to pass e2e tests
ChristofferEdlund Nov 22, 2023
4371111
black and ruff --fix all
ChristofferEdlund Nov 22, 2023
da8bf83
updated formatting
ChristofferEdlund Nov 22, 2023
3f4b740
minor changes to complex polygon
ChristofferEdlund Nov 22, 2023
28024bc
minor fix
ChristofferEdlund Nov 22, 2023
31a8314
black
ChristofferEdlund Nov 22, 2023
12239c8
minor updates based on comments
ChristofferEdlund Nov 22, 2023
c942a81
added PolygonPath and PolygonPaths definitions
ChristofferEdlund Nov 27, 2023
d75dcdd
merge
ChristofferEdlund Nov 27, 2023
3bac7e1
extended convertion tests
ChristofferEdlund Oct 16, 2023
05ed7c8
local changes
ChristofferEdlund Nov 27, 2023
41dd767
reverted changes to internal darwin format, convertion to v2 only don…
ChristofferEdlund Nov 27, 2023
f41a324
black
ChristofferEdlund Nov 27, 2023
dc2ac42
ruff and black
ChristofferEdlund Nov 27, 2023
a50f84e
merged from master
ChristofferEdlund Nov 27, 2023
7175c12
removed settings.json changes
ChristofferEdlund Nov 27, 2023
f06759c
reverting to non-v2 data.zip
ChristofferEdlund Nov 27, 2023
395b2c1
merged darwin_v1_test changes from origin master
ChristofferEdlund Nov 27, 2023
0153c78
changed json stream to normal json
ChristofferEdlund Nov 27, 2023
09b00d2
Fix `RecursionError` in `Item` class (#732)
saurbhc Nov 28, 2023
72bc6fc
commiting changes
ChristofferEdlund Nov 28, 2023
89a8a24
Merge remote-tracking branch 'origin' into ai-1410-darwin-v2-export-fix
ChristofferEdlund Nov 28, 2023
0e095c4
fixed json-stream error when checking for non-empty lists
ChristofferEdlund Nov 28, 2023
28c168c
remove unused import
ChristofferEdlund Nov 28, 2023
e991394
fixed video to image convertion bug when folders are used
ChristofferEdlund Nov 30, 2023
ec92bda
removed debug print
ChristofferEdlund Nov 30, 2023
ce7788c
removed debug print
ChristofferEdlund Nov 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion darwin/dataset/remote_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def split_video_annotations(self, release_name: str = "latest") -> None:

frame_annotations = split_video_annotation(darwin_annotation)
for frame_annotation in frame_annotations:
annotation = build_image_annotation(frame_annotation)
annotation = self._build_image_annotation(frame_annotation)

video_frame_annotations_path = annotations_path / annotation_file.stem
video_frame_annotations_path.mkdir(exist_ok=True, parents=True)
Expand Down Expand Up @@ -947,3 +947,8 @@ def local_images_path(self) -> Path:
def identifier(self) -> DatasetIdentifier:
"""The ``DatasetIdentifier`` of this ``RemoteDataset``."""
return DatasetIdentifier(team_slug=self.team, dataset_slug=self.slug)

def _build_image_annotation(
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
8 changes: 7 additions & 1 deletion darwin/dataset/remote_dataset_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
UploadHandlerV1,
)
from darwin.dataset.utils import is_relative_to
from darwin.datatypes import ItemId, PathLike
from darwin.datatypes import AnnotationFile, ItemId, PathLike
from darwin.exceptions import NotFound, ValidationError
from darwin.exporter.formats.darwin_1_0 import build_image_annotation
from darwin.item import DatasetItem
from darwin.item_sorter import ItemSorter
from darwin.utils import find_files, urljoin
Expand Down Expand Up @@ -512,3 +513,8 @@ def import_annotation(self, item_id: ItemId, payload: Dict[str, Any]) -> None:
"""

self.client.import_annotation(item_id, payload=payload)

def _build_image_annotation(
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
8 changes: 7 additions & 1 deletion darwin/dataset/remote_dataset_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@
UploadHandlerV2,
)
from darwin.dataset.utils import is_relative_to
from darwin.datatypes import ItemId, PathLike
from darwin.datatypes import AnnotationFile, ItemId, PathLike
from darwin.exceptions import NotFound, UnknownExportVersion
from darwin.exporter.formats.darwin import build_image_annotation
from darwin.item import DatasetItem
from darwin.item_sorter import ItemSorter
from darwin.utils import find_files, urljoin
Expand Down Expand Up @@ -543,3 +544,8 @@ def _fetch_stages(self, stage_type):
workflow_id,
[stage for stage in workflow["stages"] if stage["type"] == stage_type],
)

def _build_image_annotation(
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
170 changes: 129 additions & 41 deletions darwin/exporter/formats/darwin.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,65 +16,153 @@

def build_image_annotation(annotation_file: dt.AnnotationFile) -> Dict[str, Any]:
"""
Builds and returns a dictionary with the annotations present in the given file.
Builds and returns a dictionary with the annotations present in the given file in Darwin v2 format.

Parameters
----------
annotation_file: dt.AnnotationFile
annotation_file: AnnotationFile
File with the image annotations to extract.
For schema, see: https://darwin-public.s3.eu-west-1.amazonaws.com/darwin_json/2.0/schema.json

Returns
-------
Dict[str, Any]
A dictionary with the annotation from the given file. Has the following structure:

.. code-block:: python

{
"annotations": [
{
"annotation_type": { ... }, # annotation_data
"name": "annotation class name",
"bounding_box": { ... } # Optional parameter, only present if the file has a bounding box as well
}
],
"image": {
"filename": "a_file_name.json",
"height": 1000,
"width": 2000,
"url": "https://www.darwin.v7labs.com/..."
}
}
A dictionary with the annotations in Darwin v2 format.
"""
annotations: List[Dict[str, Any]] = []
print(annotations)
annotations_list: List[Dict[str, Any]] = []

for annotation in annotation_file.annotations:
payload = {
annotation.annotation_class.annotation_type: _build_annotation_data(
annotation
),
"name": annotation.annotation_class.name,
}
annotation_data = _build_v2_annotation_data(annotation)
annotations_list.append(annotation_data)

slots_data = _build_slots_data(annotation_file.slots)
item = _build_item_data(annotation_file)
item["slots"] = slots_data

return {
"version": "2.0",
"schema_ref": "https://darwin-public.s3.eu-west-1.amazonaws.com/darwin_json/2.0/schema.json",
"item": item,
"annotations": annotations_list,
}


def _build_v2_annotation_data(annotation: dt.Annotation) -> Dict[str, Any]:
annotation_data = {"id": annotation.id, "name": annotation.annotation_class.name}

if annotation.annotation_class.annotation_type == "bounding_box":
annotation_data["bounding_box"] = _build_bounding_box_data(annotation.data)
elif annotation.annotation_class.annotation_type == "tag":
annotation_data["tag"] = {}
elif annotation.annotation_class.annotation_type == "polygon":
annotation_data["polygon"] = _build_polygon_data(annotation.data)

return annotation_data


def _build_bounding_box_data(data: Dict[str, Any]) -> Dict[str, Any]:
return {
"h": data.get("h"),
"w": data.get("w"),
"x": data.get("x"),
"y": data.get("y"),
}


def _build_polygon_data(
data: Dict[str, Any]
) -> Dict[str, List[List[Dict[str, float]]]]:
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
"""
Builds the polygon data for Darwin v2 format.

Parameters
----------
data : Dict[str, Any]
The original data for the polygon annotation.

Returns
-------
Dict[str, List[List[Dict[str, float]]]]
The polygon data in the format required for Darwin v2 annotations.
"""
# Assuming the data contains a 'paths' key that is a list of lists of points,
# where each point is a dictionary with 'x' and 'y' keys.
paths = data.get("paths", [])
v2_paths = []

for path in paths:
v2_path = []
for point in path:
v2_point = {"x": point.get("x"), "y": point.get("y")}
v2_path.append(v2_point)
v2_paths.append(v2_path)

return {"paths": v2_paths}

if (
annotation.annotation_class.annotation_type == "complex_polygon"
or annotation.annotation_class.annotation_type == "polygon"
) and "bounding_box" in annotation.data:
payload["bounding_box"] = annotation.data["bounding_box"]

annotations.append(payload)
def _build_item_data(annotation_file: dt.AnnotationFile) -> Dict[str, Any]:
"""
Constructs the 'item' section of the Darwin v2 format annotation.

Parameters
----------
annotation_file: dt.AnnotationFile
The AnnotationFile object containing annotation data.

Returns
-------
Dict[str, Any]
The 'item' section of the Darwin v2 format annotation.
"""
return {
"annotations": annotations,
"image": {
"filename": annotation_file.filename,
"height": annotation_file.image_height,
"width": annotation_file.image_width,
"url": annotation_file.image_url,
"name": annotation_file.filename,
"path": annotation_file.remote_path or "/",
"source_info": {
"dataset": {
"name": annotation_file.dataset_name,
"slug": annotation_file.dataset_name.lower().replace(" ", "-")
if annotation_file.dataset_name
else None,
},
"item_id": annotation_file.item_id or "unknown-item-id",
"team": {
"name": None, # TODO Replace with actual team name
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
"slug": None, # TODO Replace with actual team slug
},
"workview_url": annotation_file.workview_url,
},
}


def _build_slots_data(slots: List[dt.Slot]) -> List[Dict[str, Any]]:
"""
Constructs the 'slots' data for the Darwin v2 format annotation.

Parameters
----------
slots: List[Slot]
A list of Slot objects from the AnnotationFile.

Returns
-------
List[Dict[str, Any]]
The 'slots' data for the Darwin v2 format annotation.
"""
slots_data = []
for slot in slots:
slot_data = {
"type": slot.type,
"slot_name": slot.name,
"width": slot.width,
"height": slot.height,
"thumbnail_url": slot.thumbnail_url,
"source_files": slot.source_files,
}
slots_data.append(slot_data)

return slots_data


@deprecation.deprecated(
deprecated_in="0.7.8",
removed_in="0.8.0",
Expand Down
74 changes: 73 additions & 1 deletion darwin/exporter/formats/darwin_1_0.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from pathlib import Path
from typing import Iterable, List, Union
from typing import Any, Dict, Iterable, List, Union

import orjson as json

Expand Down Expand Up @@ -213,3 +213,75 @@ def _build_metadata(annotation_file: AnnotationFile) -> DictFreeForm:
return {"metadata": annotation_file.slots[0].metadata}
else:
return {}


def build_image_annotation(annotation_file: AnnotationFile) -> Dict[str, Any]:
"""
Builds and returns a dictionary with the annotations present in the given file.

Parameters
----------
annotation_file: dt.AnnotationFile
File with the image annotations to extract.

Returns
-------
Dict[str, Any]
A dictionary with the annotation from the given file. Has the following structure:

.. code-block:: python

{
"annotations": [
{
"annotation_type": { ... }, # annotation_data
"name": "annotation class name",
"bounding_box": { ... } # Optional parameter, only present if the file has a bounding box as well
}
],
"image": {
"filename": "a_file_name.json",
"height": 1000,
"width": 2000,
"url": "https://www.darwin.v7labs.com/..."
}
}
"""
annotations: List[Dict[str, Any]] = []
for annotation in annotation_file.annotations:
payload = {
annotation.annotation_class.annotation_type: _build_annotation_data(
annotation
),
"name": annotation.annotation_class.name,
}

if (
annotation.annotation_class.annotation_type == "complex_polygon"
or annotation.annotation_class.annotation_type == "polygon"
) and "bounding_box" in annotation.data:
payload["bounding_box"] = annotation.data["bounding_box"]

annotations.append(payload)

return {
"annotations": annotations,
"image": {
"filename": annotation_file.filename,
"height": annotation_file.image_height,
"width": annotation_file.image_width,
"url": annotation_file.image_url,
},
}


def _build_annotation_data(annotation: Annotation) -> Dict[str, Any]:
if annotation.annotation_class.annotation_type == "complex_polygon":
return {"path": annotation.data["paths"]}

if annotation.annotation_class.annotation_type == "polygon":
return dict(
filter(lambda item: item[0] != "bounding_box", annotation.data.items())
)

return dict(annotation.data)
4 changes: 4 additions & 0 deletions darwin/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1005,6 +1005,10 @@ def split_video_annotation(annotation: dt.AnnotationFile) -> List[dt.AnnotationF
for a in annotation.annotations
if isinstance(a, dt.VideoAnnotation) and i in a.frames
]

if len(annotations) < 1:
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
continue

annotation_classes: Set[dt.AnnotationClass] = set(
[annotation.annotation_class for annotation in annotations]
)
Expand Down
15 changes: 3 additions & 12 deletions tests/darwin/dataset/remote_dataset_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -391,8 +391,10 @@ def test_works_on_videos(
)
assert video_path.exists()

print(list(video_path.iterdir()))
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved

assert (video_path / "0000000.json").exists()
assert (video_path / "0000001.json").exists()
assert not (video_path / "0000001.json").exists()
assert (video_path / "0000002.json").exists()
assert not (video_path / "0000003.json").exists()

Expand All @@ -418,17 +420,6 @@ def test_works_on_videos(
},
}

with (video_path / "0000001.json").open() as f:
assert json.loads(f.read()) == {
"annotations": [],
"image": {
"filename": "test_video/0000001.png",
"height": 1080,
"url": "frame_2.jpg",
"width": 1920,
},
}

with (video_path / "0000002.json").open() as f:
assert json.loads(f.read()) == {
"annotations": [
Expand Down
Loading