Image Segmentation: Data Preprocessing Verification -- Checksum fails #545

nmcglo · 2022-04-19T21:14:28Z

The Image Segmentation (Pytorch UNet3D) benchmark relies on the KITS19 dataset. I've followed the instructions from the KITS19 dataset repository for downloading the dataset and have been trying to run the data preprocessing script (https://github.com/mlcommons/training/blob/master/image_segmentation/pytorch/preprocess_dataset.py)

The cases all pre-process just fine but I get an error when the verify_dataset() function is called. At least one of the cases (Case 00043 specifically) has an md5 checksum hash that does not match the expected checksum value from the mlcommons image segmentation repo (https://github.com/mlcommons/training/blob/master/image_segmentation/pytorch/checksum.json). I haven't exhaustively checked each of them but if I run my own md5 hash on these case files, a random sampling of 10 or so all matched the expected values but the hash for case 43 does not match.

I have downloaded the dataset using both download scripts a total of 7 times and get the exact same invalid checksum each time so it isn't a corrupted download (at least on my end).

mmarcinkiewicz · 2022-11-04T16:10:38Z

Hi @nmcglohon , is this still a problem for you? I'm going to take a look and try to repro early next week

mmarcinkiewicz · 2022-11-15T08:05:31Z

I am able to repro. I'll reach out to the dataset owners asking for clarification whether anything has changed.

nmcglo · 2022-12-02T18:34:02Z

Thanks, apologies for the delay in response - I was away last month.

sepzjh · 2023-09-24T13:59:43Z

I have the same problem, I get an error when the verify_dataset() function is called,Has this issue been resolved? or can i skip the function?

hiwotadese · 2024-07-25T16:34:15Z

Closing because we are dropping UNet3D

wahabk · 2024-07-30T10:19:38Z

I have ran into this error as well during dataset verification:

Case 299. Skipped.
Mean value: -1.850000023841858, std: 0.9800000190734863, d: 256.0, h: 333.0, w: 333.0
  0%|▊                                                                                                                                                                                | 2/420 [00:00<01:08,  6.12it/s]
Traceback (most recent call last):
  File "preprocess_dataset.py", line 147, in <module>
    verify_dataset(args.results_dir)
  File "preprocess_dataset.py", line 132, in verify_dataset
    assert md5_hash == source[volume], f"Invalid hash for {volume}."
AssertionError: Invalid hash for case_00183_x.npy.

This time on case_00183_x.npy

@hiwotadese Can I please ask why UNet3D is being dropped? Which part of the MLCommons WG work on training and inference?

ShriyaPalsamudram · 2024-07-31T14:22:17Z

There were multiple reasons taken into consideration before dropping unet3d from the training benchmark suite. In case you are interested, the training WG meets weekly and decisions regarding which benchmarks to keep and which ones to drop are discussed in that forum.

This table lists all the current benchmarks for Training v4.1.

Note that unet3d is still a part of the inference benchmark suite as listed in this table

johntran-nv added the image_segmentation UNET-3D label Nov 8, 2022

johntran-nv assigned mmarcinkiewicz Nov 29, 2022

hiwotadese closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Segmentation: Data Preprocessing Verification -- Checksum fails #545

Image Segmentation: Data Preprocessing Verification -- Checksum fails #545

nmcglo commented Apr 19, 2022

mmarcinkiewicz commented Nov 4, 2022

mmarcinkiewicz commented Nov 15, 2022

nmcglo commented Dec 2, 2022

sepzjh commented Sep 24, 2023

hiwotadese commented Jul 25, 2024

wahabk commented Jul 30, 2024

ShriyaPalsamudram commented Jul 31, 2024

Image Segmentation: Data Preprocessing Verification -- Checksum fails #545

Image Segmentation: Data Preprocessing Verification -- Checksum fails #545

Comments

nmcglo commented Apr 19, 2022

mmarcinkiewicz commented Nov 4, 2022

mmarcinkiewicz commented Nov 15, 2022

nmcglo commented Dec 2, 2022

sepzjh commented Sep 24, 2023

hiwotadese commented Jul 25, 2024

wahabk commented Jul 30, 2024

ShriyaPalsamudram commented Jul 31, 2024