Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ai-1477]-darwin-v2-export-fix #721

Merged
merged 76 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
f963e36
fixing test
ChristofferEdlund Oct 16, 2023
08fc262
added polygon and complex polygon tests with bounding boxes
ChristofferEdlund Oct 16, 2023
db3fbde
extended convertion tests
ChristofferEdlund Oct 16, 2023
5d98462
formatter
ChristofferEdlund Oct 17, 2023
69f33f0
updated test to reflect adding of bounding box
ChristofferEdlund Nov 14, 2023
b3c82e0
updated tests to check for new format
ChristofferEdlund Nov 14, 2023
594844a
added additional test for box and tag
ChristofferEdlund Nov 14, 2023
5ff60ae
removed prints
ChristofferEdlund Nov 14, 2023
8ded8db
black reformat
ChristofferEdlund Nov 14, 2023
f45ce38
black format
ChristofferEdlund Nov 14, 2023
828653b
removed an import
ChristofferEdlund Nov 14, 2023
054e4bf
added schema ref
ChristofferEdlund Nov 14, 2023
ef5f7b8
reformated utils
ChristofferEdlund Nov 14, 2023
cfce1bc
added support RemoteDatasetV1 parsing and updated tests
ChristofferEdlund Nov 14, 2023
4de5d9e
black fomrat
ChristofferEdlund Nov 14, 2023
37fb419
additional black magic
ChristofferEdlund Nov 14, 2023
15572bc
merge from master
ChristofferEdlund Nov 15, 2023
a74b006
merge from master
ChristofferEdlund Nov 15, 2023
0914e60
fixed conflic
ChristofferEdlund Nov 15, 2023
3d62044
minor fixes
ChristofferEdlund Nov 15, 2023
ab60ec6
removed ignore of empty files
ChristofferEdlund Nov 15, 2023
b178a6c
updated tests for new (old) behaviour
ChristofferEdlund Nov 15, 2023
c21ab07
added test case
ChristofferEdlund Nov 15, 2023
2991bc8
updated test
ChristofferEdlund Nov 16, 2023
f712ce7
updated code to work with bounding boxes and polygon, also added togg…
ChristofferEdlund Nov 17, 2023
b090f61
black
ChristofferEdlund Nov 17, 2023
2af57b4
adjusting paths to pass tests
ChristofferEdlund Nov 17, 2023
2d9f9d2
black
ChristofferEdlund Nov 17, 2023
20ae224
handling polyg and bbox bounding box anno
ChristofferEdlund Nov 17, 2023
aa65a29
[PY-401][external] Restore Meta Item & ItemQuery functions (#718)
JBWilkie Nov 16, 2023
ab4627a
[PY-402][external] Archive Meta Item & ItemQuery functions (#719)
JBWilkie Nov 16, 2023
96227e5
[PY-403][external] Set Priority Meta Item & ItemQuery functions (#720)
JBWilkie Nov 16, 2023
859de11
automatic ruff --fix changes (#723)
Nathanjp91 Nov 16, 2023
ce7f4d3
minor updates to tests
ChristofferEdlund Nov 17, 2023
c66ceb8
added make polygon tests for darwin_v2 format
ChristofferEdlund Nov 17, 2023
691eed7
updating tests to accomidate darwin V2 format
ChristofferEdlund Nov 17, 2023
7106dc5
black
ChristofferEdlund Nov 17, 2023
179da54
added nifty V2 test
ChristofferEdlund Nov 17, 2023
4c30f17
Merge remote-tracking branch 'origin' into ai-1410-darwin-v2-export-fix
ChristofferEdlund Nov 17, 2023
d0b33e5
black
ChristofferEdlund Nov 17, 2023
09b105e
refactor
ChristofferEdlund Nov 20, 2023
5b1ddc4
ruff --fix
ChristofferEdlund Nov 20, 2023
3a4785a
removed try catch in stacked targets
ChristofferEdlund Nov 20, 2023
ec61777
removed print
ChristofferEdlund Nov 20, 2023
36ca721
Reverted specific files to state in commit 09b105eed12f8d0a3d21beaf62…
ChristofferEdlund Nov 20, 2023
6130abe
converting complex and regular polygon to import format
ChristofferEdlund Nov 21, 2023
9bf55b9
added a potential e2e import fix
ChristofferEdlund Nov 21, 2023
6689d9f
removed debug prints
ChristofferEdlund Nov 21, 2023
d36f7d2
latest sync
ChristofferEdlund Nov 22, 2023
d0f7ca8
updated code to pass e2e tests
ChristofferEdlund Nov 22, 2023
4371111
black and ruff --fix all
ChristofferEdlund Nov 22, 2023
da8bf83
updated formatting
ChristofferEdlund Nov 22, 2023
3f4b740
minor changes to complex polygon
ChristofferEdlund Nov 22, 2023
28024bc
minor fix
ChristofferEdlund Nov 22, 2023
31a8314
black
ChristofferEdlund Nov 22, 2023
12239c8
minor updates based on comments
ChristofferEdlund Nov 22, 2023
c942a81
added PolygonPath and PolygonPaths definitions
ChristofferEdlund Nov 27, 2023
d75dcdd
merge
ChristofferEdlund Nov 27, 2023
3bac7e1
extended convertion tests
ChristofferEdlund Oct 16, 2023
05ed7c8
local changes
ChristofferEdlund Nov 27, 2023
41dd767
reverted changes to internal darwin format, convertion to v2 only don…
ChristofferEdlund Nov 27, 2023
f41a324
black
ChristofferEdlund Nov 27, 2023
dc2ac42
ruff and black
ChristofferEdlund Nov 27, 2023
a50f84e
merged from master
ChristofferEdlund Nov 27, 2023
7175c12
removed settings.json changes
ChristofferEdlund Nov 27, 2023
f06759c
reverting to non-v2 data.zip
ChristofferEdlund Nov 27, 2023
395b2c1
merged darwin_v1_test changes from origin master
ChristofferEdlund Nov 27, 2023
0153c78
changed json stream to normal json
ChristofferEdlund Nov 27, 2023
09b00d2
Fix `RecursionError` in `Item` class (#732)
saurbhc Nov 28, 2023
72bc6fc
commiting changes
ChristofferEdlund Nov 28, 2023
89a8a24
Merge remote-tracking branch 'origin' into ai-1410-darwin-v2-export-fix
ChristofferEdlund Nov 28, 2023
0e095c4
fixed json-stream error when checking for non-empty lists
ChristofferEdlund Nov 28, 2023
28c168c
remove unused import
ChristofferEdlund Nov 28, 2023
e991394
fixed video to image convertion bug when folders are used
ChristofferEdlund Nov 30, 2023
ec92bda
removed debug print
ChristofferEdlund Nov 30, 2023
ce7788c
removed debug print
ChristofferEdlund Nov 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 20 additions & 39 deletions darwin/dataset/local_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from darwin.utils import (
SUPPORTED_IMAGE_EXTENSIONS,
get_image_path_from_stream,
is_stream_list_empty,
parse_darwin_json,
stream_darwin_json,
)
Expand Down Expand Up @@ -67,6 +68,7 @@ def __init__(
split: str = "default",
split_type: str = "random",
release_name: Optional[str] = None,
keep_empty_annotations: bool = False,
):
self.dataset_path = dataset_path
self.annotation_type = annotation_type
Expand All @@ -75,10 +77,9 @@ def __init__(
self.original_classes = None
self.original_images_path: Optional[List[Path]] = None
self.original_annotations_path: Optional[List[Path]] = None
self.keep_empty_annotations = keep_empty_annotations

release_path, annotations_dir, images_dir = self._initial_setup(
dataset_path, release_name
)
release_path, annotations_dir, images_dir = self._initial_setup(dataset_path, release_name)
self._validate_inputs(partition, split_type, annotation_type)
# Get the list of classes

Expand All @@ -101,6 +102,7 @@ def __init__(
split,
partition,
split_type,
keep_empty_annotations,
)

if len(self.images_path) == 0:
Expand All @@ -117,9 +119,7 @@ def _validate_inputs(self, partition, split_type, annotation_type):
if split_type not in ["random", "stratified"]:
raise ValueError("split_type should be either 'random', 'stratified'")
if annotation_type not in ["tag", "polygon", "bounding_box"]:
raise ValueError(
"annotation_type should be either 'tag', 'bounding_box', or 'polygon'"
)
raise ValueError("annotation_type should be either 'tag', 'bounding_box', or 'polygon'")

def _setup_annotations_and_images(
self,
Expand All @@ -130,19 +130,21 @@ def _setup_annotations_and_images(
split,
partition,
split_type,
keep_empty_annotations: bool = False,
):
# Find all the annotations and their corresponding images
for annotation_path in sorted(annotations_dir.glob("**/*.json")):
darwin_json = stream_darwin_json(annotation_path)

image_path = get_image_path_from_stream(darwin_json, images_dir)
if image_path.exists():
if not keep_empty_annotations and is_stream_list_empty(darwin_json["annotations"]):
continue
self.images_path.append(image_path)
self.annotations_path.append(annotation_path)
continue
else:
raise ValueError(
f"Annotation ({annotation_path}) does not have a corresponding image"
)
raise ValueError(f"Annotation ({annotation_path}) does not have a corresponding image {image_path}")

def _initial_setup(self, dataset_path, release_name):
assert dataset_path is not None
Expand Down Expand Up @@ -201,9 +203,7 @@ def get_height_and_width(self, index: int) -> Tuple[float, float]:
parsed = parse_darwin_json(self.annotations_path[index], index)
return parsed.image_height, parsed.image_width

def extend(
self, dataset: "LocalDataset", extend_classes: bool = False
) -> "LocalDataset":
def extend(self, dataset: "LocalDataset", extend_classes: bool = False) -> "LocalDataset":
"""
Extends the current dataset with another one.

Expand Down Expand Up @@ -298,10 +298,7 @@ def parse_json(self, index: int) -> Dict[str, Any]:
# Filter out unused classes and annotations of a different type
if self.classes is not None:
annotations = [
a
for a in annotations
if a.annotation_class.name in self.classes
and self.annotation_type_supported(a)
a for a in annotations if a.annotation_class.name in self.classes and self.annotation_type_supported(a)
]
return {
"image_id": index,
Expand All @@ -318,20 +315,15 @@ def annotation_type_supported(self, annotation) -> bool:
elif self.annotation_type == "bounding_box":
is_bounding_box = annotation_type == "bounding_box"
is_supported_polygon = (
annotation_type in ["polygon", "complex_polygon"]
and "bounding_box" in annotation.data
annotation_type in ["polygon", "complex_polygon"] and "bounding_box" in annotation.data
)
return is_bounding_box or is_supported_polygon
elif self.annotation_type == "polygon":
return annotation_type in ["polygon", "complex_polygon"]
else:
raise ValueError(
"annotation_type should be either 'tag', 'bounding_box', or 'polygon'"
)
raise ValueError("annotation_type should be either 'tag', 'bounding_box', or 'polygon'")

def measure_mean_std(
self, multi_threaded: bool = True
) -> Tuple[np.ndarray, np.ndarray]:
def measure_mean_std(self, multi_threaded: bool = True) -> Tuple[np.ndarray, np.ndarray]:
"""
Computes mean and std of trained images, given the train loader.

Expand All @@ -354,9 +346,7 @@ def measure_mean_std(
results = pool.map(self._return_mean, self.images_path)
mean = np.sum(np.array(results), axis=0) / len(self.images_path)
# Online image_classification deviation
results = pool.starmap(
self._return_std, [[item, mean] for item in self.images_path]
)
results = pool.starmap(self._return_std, [[item, mean] for item in self.images_path])
std_sum = np.sum(np.array([item[0] for item in results]), axis=0)
total_pixel_count = np.sum(np.array([item[1] for item in results]))
std = np.sqrt(std_sum / total_pixel_count)
Expand Down Expand Up @@ -402,20 +392,14 @@ def _compute_weights(labels: List[int]) -> np.ndarray:
@staticmethod
def _return_mean(image_path: Path) -> np.ndarray:
img = np.array(load_pil_image(image_path))
mean = np.array(
[np.mean(img[:, :, 0]), np.mean(img[:, :, 1]), np.mean(img[:, :, 2])]
)
mean = np.array([np.mean(img[:, :, 0]), np.mean(img[:, :, 1]), np.mean(img[:, :, 2])])
return mean / 255.0

# Loads an image with OpenCV and returns the channel wise std of the image.
@staticmethod
def _return_std(image_path: Path, mean: np.ndarray) -> Tuple[np.ndarray, float]:
img = np.array(load_pil_image(image_path)) / 255.0
m2 = np.square(
np.array(
[img[:, :, 0] - mean[0], img[:, :, 1] - mean[1], img[:, :, 2] - mean[2]]
)
)
m2 = np.square(np.array([img[:, :, 0] - mean[0], img[:, :, 1] - mean[1], img[:, :, 2] - mean[2]]))
return np.sum(np.sum(m2, axis=1), 1), m2.size / 3.0

def __getitem__(self, index: int):
Expand Down Expand Up @@ -485,10 +469,7 @@ def build_stems(
"""

if partition is None:
return (
str(e.relative_to(annotations_dir).parent / e.stem)
for e in sorted(annotations_dir.glob("**/*.json"))
)
return (str(e.relative_to(annotations_dir).parent / e.stem) for e in sorted(annotations_dir.glob("**/*.json")))

if split_type == "random":
split_filename = f"{split_type}_{partition}.txt"
Expand Down
7 changes: 6 additions & 1 deletion darwin/dataset/remote_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def split_video_annotations(self, release_name: str = "latest") -> None:

frame_annotations = split_video_annotation(darwin_annotation)
for frame_annotation in frame_annotations:
annotation = build_image_annotation(frame_annotation)
annotation = self._build_image_annotation(frame_annotation)

video_frame_annotations_path = annotations_path / annotation_file.stem
video_frame_annotations_path.mkdir(exist_ok=True, parents=True)
Expand Down Expand Up @@ -947,3 +947,8 @@ def local_images_path(self) -> Path:
def identifier(self) -> DatasetIdentifier:
"""The ``DatasetIdentifier`` of this ``RemoteDataset``."""
return DatasetIdentifier(team_slug=self.team, dataset_slug=self.slug)

def _build_image_annotation(
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
8 changes: 7 additions & 1 deletion darwin/dataset/remote_dataset_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@
UploadHandlerV1,
)
from darwin.dataset.utils import is_relative_to
from darwin.datatypes import ItemId, PathLike
from darwin.datatypes import AnnotationFile, ItemId, PathLike
from darwin.exceptions import NotFound, ValidationError
from darwin.exporter.formats.darwin_1_0 import build_image_annotation
from darwin.item import DatasetItem
from darwin.item_sorter import ItemSorter
from darwin.utils import find_files, urljoin
Expand Down Expand Up @@ -512,3 +513,8 @@ def import_annotation(self, item_id: ItemId, payload: Dict[str, Any]) -> None:
"""

self.client.import_annotation(item_id, payload=payload)

def _build_image_annotation(
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
8 changes: 7 additions & 1 deletion darwin/dataset/remote_dataset_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@
UploadHandlerV2,
)
from darwin.dataset.utils import is_relative_to
from darwin.datatypes import ItemId, PathLike
from darwin.datatypes import AnnotationFile, ItemId, PathLike
from darwin.exceptions import NotFound, UnknownExportVersion
from darwin.exporter.formats.darwin import build_image_annotation
from darwin.item import DatasetItem
from darwin.item_sorter import ItemSorter
from darwin.utils import find_files, urljoin
Expand Down Expand Up @@ -543,3 +544,8 @@ def _fetch_stages(self, stage_type):
workflow_id,
[stage for stage in workflow["stages"] if stage["type"] == stage_type],
)

def _build_image_annotation(
ChristofferEdlund marked this conversation as resolved.
Show resolved Hide resolved
self, annotation_file: AnnotationFile
) -> Dict[str, Any]:
return build_image_annotation(annotation_file)
Loading
Loading