Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated with code to use our instrumentation (some more README update… #692

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions image_segmentation/pytorch/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
ARG FROM_IMAGE_NAME=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime
#ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.02-py3
ARG FROM_IMAGE_NAME=pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
FROM ${FROM_IMAGE_NAME}

ADD . /workspace/unet3d
Expand Down
17 changes: 16 additions & 1 deletion image_segmentation/pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ arXiv preprint arXiv:1904.00445 (2019).
```bash
mkdir data
mkdir results
docker run --ipc=host -it --rm --runtime=nvidia -v RAW-DATA-DIR:/raw_data -v PREPROCESSED-DATA-DIR:/data -v RESULTS-DIR:/results unet3d:latest /bin/bash
docker run --ipc=host -it --rm --gpus all -v RAW-DATA-DIR:/raw_data -v PREPROCESSED-DATA-DIR:/data -v RESULTS-DIR:/results unet3d:latest /bin/bash
```

3. Preprocess the dataset.
Expand All @@ -76,11 +76,26 @@ arXiv preprint arXiv:1904.00445 (2019).

The script will preprocess each volume and save it as a numpy array at `/data`. It will also display some statistics like the volume shape, mean and stddev of the voxel intensity. Also, it will run a checksum on each file comparing it with the source.

4. Build PyTorch and TorchVision:

We have not made any functional changes to the libraries which means PyTorch and TorchVision can be built as per the instructions.
If there's an issue during the build of these libraries, it has nothing to do with our instrumentation.

1. The container image already has conda installed so simply create a new environment.
2. Install PyTorch before TorchVision because during TorchVision build if PyTorch is not found to be already installed then it will install from the public pytorch channel instead of using our build.
3. Install PyTorch from https://github.com/rajveerb/pytorch.git using branch `v2.0.0_instrumentation`
4. Install TorchVision from https://github.com/rajveerb/vision.git using branch `v0.15_instrumentation`
5. You're all set!


## Steps to run and time

The basic command to run on 1 worker takes form:
```bash
#Use below for no logging
bash run_and_time.sh <SEED>
#Use below for logging, note: below works only for training case not for validation although it can be easily extended
bash run_and_time.sh <SEED> <LOG_FILE_PATH>
```

The script assumes that the data is available at `/data` directory.
Expand Down
2 changes: 1 addition & 1 deletion image_segmentation/pytorch/data_loading/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def get_data_loaders(flags, num_shards, global_rank):

elif flags.loader == "pytorch":
x_train, x_val, y_train, y_val = get_data_split(flags.data_dir, num_shards, shard_id=global_rank)
train_data_kwargs = {"patch_size": flags.input_shape, "oversampling": flags.oversampling, "seed": flags.seed}
train_data_kwargs = {"patch_size": flags.input_shape, "oversampling": flags.oversampling, "seed": flags.seed, "log_file": flags.log_file}
train_dataset = PytTrain(x_train, y_train, **train_data_kwargs)
val_dataset = PytVal(x_val, y_val)
else:
Expand Down
25 changes: 17 additions & 8 deletions image_segmentation/pytorch/data_loading/pytorch_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@
from torchvision import transforms


def get_train_transforms():
def get_train_transforms(patch_size,oversampling,log_file):
load_image = LoadImage()
rand_crop = RandBalancedCrop(patch_size=patch_size, oversampling=oversampling)
rand_flip = RandFlip()
cast = Cast(types=(np.float32, np.uint8))
rand_scale = RandomBrightnessAugmentation(factor=0.3, prob=0.1)
rand_noise = GaussianNoise(mean=0.0, std=0.1, prob=0.1)
train_transforms = transforms.Compose([rand_flip, cast, rand_scale, rand_noise])
train_transforms = transforms.Compose([load_image,rand_crop,rand_flip, cast, rand_scale, rand_noise],log_transform_elapsed_time=log_file)
return train_transforms


Expand Down Expand Up @@ -134,21 +136,28 @@ def __call__(self, data):
return data


class LoadImage:
def __init__(self):
pass
def __call__(self,data):
data = {"image": np.load(data['image']), "label": np.load(data['label'])}
return data


class PytTrain(Dataset):
def __init__(self, images, labels, **kwargs):
self.images, self.labels = images, labels
self.train_transforms = get_train_transforms()
patch_size, oversampling = kwargs["patch_size"], kwargs["oversampling"]
patch_size, oversampling, log_file = kwargs["patch_size"], kwargs["oversampling"], kwargs["log_file"]
self.patch_size = patch_size
self.rand_crop = RandBalancedCrop(patch_size=patch_size, oversampling=oversampling)
self.log_file=log_file
self.train_transforms = get_train_transforms(patch_size,oversampling,log_file)

def __len__(self):
return len(self.images)

def __getitem__(self, idx):
data = {"image": np.load(self.images[idx]), "label": np.load(self.labels[idx])}
data = self.rand_crop(data)
data = self.train_transforms(data)
data = {"image": self.images[idx], "label": self.labels[idx]}
data = self.train_transforms(data,)
return data["image"], data["label"]


Expand Down
3 changes: 2 additions & 1 deletion image_segmentation/pytorch/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
git+https://github.com/NVIDIA/dllogger
https://github.com/mlcommons/logging/archive/refs/tags/1.1.0-rc4.zip
nibabel==3.2.1
scipy==1.5.2
scipy==1.5.2
tqdm==4.66.1
18 changes: 17 additions & 1 deletion image_segmentation/pytorch/run_and_time.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ LR_WARMUP_EPOCHS=200
DATASET_DIR="/data"
BATCH_SIZE=2
GRADIENT_ACCUMULATION_STEPS=1
LOG_FILE_PATH=${2}


if [ -d ${DATASET_DIR} ]
Expand All @@ -31,7 +32,21 @@ from mlperf_logging.mllog import constants
from runtime.logging import mllog_event
mllog_event(key=constants.CACHE_CLEAR, value=True)"

python main.py --data_dir ${DATASET_DIR} \
if [ $# -eq 2 ]; then
python main.py --data_dir ${DATASET_DIR} \
--epochs ${MAX_EPOCHS} \
--evaluate_every ${EVALUATE_EVERY} \
--start_eval_at ${START_EVAL_AT} \
--quality_threshold ${QUALITY_THRESHOLD} \
--batch_size ${BATCH_SIZE} \
--optimizer sgd \
--ga_steps ${GRADIENT_ACCUMULATION_STEPS} \
--learning_rate ${LEARNING_RATE} \
--seed ${SEED} \
--lr_warmup_epochs ${LR_WARMUP_EPOCHS} \
--log_file ${LOG_FILE_PATH}
else
python main.py --data_dir ${DATASET_DIR} \
--epochs ${MAX_EPOCHS} \
--evaluate_every ${EVALUATE_EVERY} \
--start_eval_at ${START_EVAL_AT} \
Expand All @@ -42,6 +57,7 @@ mllog_event(key=constants.CACHE_CLEAR, value=True)"
--learning_rate ${LEARNING_RATE} \
--seed ${SEED} \
--lr_warmup_epochs ${LR_WARMUP_EPOCHS}
fi

# end timing
end=$(date +%s)
Expand Down
2 changes: 2 additions & 0 deletions image_segmentation/pytorch/runtime/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,5 @@
PARSER.add_argument('--include_background', dest='include_background', action='store_true', default=False)
PARSER.add_argument('--cudnn_benchmark', dest='cudnn_benchmark', action='store_true', default=False)
PARSER.add_argument('--cudnn_deterministic', dest='cudnn_deterministic', action='store_true', default=False)

PARSER.add_argument('--log_file', type=str, help='File path to log preprocessing related operations', default=None)
Loading