-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Segmentation: Data Preprocessing Verification -- Checksum fails #545
Comments
Hi @nmcglohon , is this still a problem for you? I'm going to take a look and try to repro early next week |
I am able to repro. I'll reach out to the dataset owners asking for clarification whether anything has changed. |
Thanks, apologies for the delay in response - I was away last month. |
I have the same problem, I get an error when the verify_dataset() function is called,Has this issue been resolved? or can i skip the function? |
Closing because we are dropping UNet3D |
I have ran into this error as well during dataset verification:
This time on @hiwotadese Can I please ask why UNet3D is being dropped? Which part of the MLCommons WG work on training and inference? |
There were multiple reasons taken into consideration before dropping unet3d from the training benchmark suite. In case you are interested, the training WG meets weekly and decisions regarding which benchmarks to keep and which ones to drop are discussed in that forum. This table lists all the current benchmarks for Training v4.1. Note that unet3d is still a part of the inference benchmark suite as listed in this table |
The Image Segmentation (Pytorch UNet3D) benchmark relies on the KITS19 dataset. I've followed the instructions from the KITS19 dataset repository for downloading the dataset and have been trying to run the data preprocessing script (https://github.com/mlcommons/training/blob/master/image_segmentation/pytorch/preprocess_dataset.py)
The cases all pre-process just fine but I get an error when the verify_dataset() function is called. At least one of the cases (Case 00043 specifically) has an md5 checksum hash that does not match the expected checksum value from the mlcommons image segmentation repo (https://github.com/mlcommons/training/blob/master/image_segmentation/pytorch/checksum.json). I haven't exhaustively checked each of them but if I run my own md5 hash on these case files, a random sampling of 10 or so all matched the expected values but the hash for case 43 does not match.
I have downloaded the dataset using both download scripts a total of 7 times and get the exact same invalid checksum each time so it isn't a corrupted download (at least on my end).
The text was updated successfully, but these errors were encountered: