Task killed when training for large zarr file #2

DZGong · 2023-09-14T03:15:52Z

I was trying to train the model with a data set of 40x4096x4096x3 (NWHC), but the process was always killed, as shown in the following snapshot. This doesn't happen if I switch to a smaller data set (10x4096x4096) and the train goes well.

The dataset was originally a single tiff file and was transformed into a zarr file by using function zarr.convinient.save(). The data set was then split into train, val, and test by using the code multiscale_zarr_data_generator.py. Then started training by run.py.

The computation system I used has 68G(?) CPU memory and 16G GPU memory as shown in the following snapshot:

The text was updated successfully, but these errors were encountered:

DZGong · 2023-09-14T03:27:26Z

This is what goes after RuntimeError: DataLoader worker (pid 1749166) is killed by signal: Killed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task killed when training for large zarr file #2

Task killed when training for large zarr file #2

DZGong commented Sep 14, 2023

DZGong commented Sep 14, 2023

Task killed when training for large zarr file #2

Task killed when training for large zarr file #2

Comments

DZGong commented Sep 14, 2023

DZGong commented Sep 14, 2023