ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). #55

ptlzon · 2024-03-12T05:36:54Z

When trying to train the demo file, I got this error.

(napari-gpu-py39) xxx@hpcg01:~$ napari
2024-03-12 13:35:31.656562: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-12 13:35:31.657001: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-12 13:35:31.660024: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-12 13:35:31.697270: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-12 13:35:32.441568: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-12 13:35:33.010855: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO: Downloading data can take a few minutes.
INFO: Loading data
INFO: Shaping data
Generated patches: (392, 64, 64, 1)
Train patches: (387, 64, 64, 1)
Val patches: (5, 64, 64, 1)
INFO: Creating model
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/superqt/utils/_qthreading.py:617, in create_worker.<locals>.reraise(e=ValueError('When using `save_weights_only=True` ...36650/.napari/N2V/models/n2v_2D/weights_best.h5'))
    616 def reraise(e):
--> 617     raise e
        e = ValueError('When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). Received: filepath=/home/xxx/.napari/N2V/models/n2v_2D/weights_best.h5')

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/superqt/utils/_qthreading.py:178, in WorkerBase.run(self=<napari._qt.qthreading.GeneratorWorker object>)
    176     warnings.filterwarnings("always")
    177     warnings.showwarning = lambda *w: self.warned.emit(w)
--> 178     result = self.work()
        self = <napari._qt.qthreading.GeneratorWorker object at 0x7f272362d550>
    179 if isinstance(result, Exception):
    180     if isinstance(result, RuntimeError):
    181         # The Worker object has likely been deleted.
    182         # A deleted wrapped C/C++ object may result in a runtime
    183         # error that will cause segfault if we try to do much other
    184         # than simply notify the user.

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/superqt/utils/_qthreading.py:444, in GeneratorWorker.work(self=<napari._qt.qthreading.GeneratorWorker object>)
    442 try:
    443     _input = self._next_value()
--> 444     output = self._gen.send(_input)
        self = <napari._qt.qthreading.GeneratorWorker object at 0x7f272362d550>
        _input = None
        self._gen = <generator object train_worker at 0x7f272362e040>
    445     self.yielded.emit(output)
    446 except StopIteration as exc:

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/napari_n2v/utils/training_worker.py:111, in train_worker(widget=<napari_n2v._train_widget.TrainWidget object>, pretrained_model=None, expert_settings=None)
    108 widget.weights_path = Path(base_dir, model_name, 'weights_best.h5').absolute()
    110 try:
--> 111     model = create_model(X_train,
        X_train = <class 'numpy.ndarray'> (387, 64, 64, 1) float32
        n_epochs = 30
        n_steps = 200
        batch_size = 16
        model_name = 'n2v_2D'
        base_dir = PosixPath('models')
        updater = <napari_n2v.utils.training_worker.Updater object at 0x7f2723633190>
        expert_settings = None
    112                          n_epochs,
    113                          n_steps,
    114                          batch_size,
    115                          model_name,
    116                          base_dir.absolute(),
    117                          updater,
    118                          expert_settings=expert_settings)
    119 except InternalError as e:
    120     print(e.message)

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/napari_n2v/utils/n2v_utils.py:181, in create_model(X_patches=<class 'numpy.ndarray'> (387, 64, 64, 1) float32, n_epochs=30, n_steps=200, batch_size=16, model_name='n2v_2D', basedir=PosixPath('/home/xxx/.napari/N2V/models'), updater=<napari_n2v.utils.training_worker.Updater object>, expert_settings=None, train=True)
    178 model = N2V(config, model_name, basedir=basedir)
    180 if train:
--> 181     model.prepare_for_training(metrics={})
        model = N2V(n2v_2D): YXC → YXC
├─ Directory: /home/xxx/.napari/N2V/models/n2v_2D
└─ N2VConfig(means=['46921.984'], stds=['16851.47'], n_dim=2, axes='YXC', n_channel_in=1, n_channel_out=1, unet_residual=False, unet_n_depth=2, unet_kern_size=5, unet_n_first=32, unet_last_activation='linear', unet_input_shape=(None, None, 1), train_loss='mse', train_epochs=30, train_steps_per_epoch=200, train_learning_rate=0.0004, train_batch_size=16, train_tensorboard=True, train_checkpoint='weights_best.h5', train_reduce_lr={'factor': 0.5, 'patience': 10}, batch_norm=True, n2v_perc_pix=0.198, n2v_patch_shape=[64, 64], n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=True, blurpool=False, skip_skipone=False, structN2Vmask=None, probabilistic=False)
    183 # add updater
    184 if updater:

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/n2v/models/n2v_standard.py:302, in N2V.prepare_for_training(self=N2V(n2v_2D): YXC → YXC
├─ Directory: /home/xxx6...e=False, structN2Vmask=None, probabilistic=False), optimizer=<keras.src.optimizers.adam.Adam object>, **kwargs={'metrics': {}})
    299 if self.config.train_checkpoint is not None:
    300     from tensorflow.keras.callbacks import ModelCheckpoint
    301     self.callbacks.append(
--> 302         ModelCheckpoint(str(self.logdir / self.config.train_checkpoint), save_best_only=True,
        self.callbacks = [<keras.src.callbacks.terminate_on_nan.TerminateOnNaN object at 0x7f28b85a2df0>]
        self = N2V(n2v_2D): YXC → YXC
├─ Directory: /home/xxx/.napari/N2V/models/n2v_2D
└─ N2VConfig(means=['46921.984'], stds=['16851.47'], n_dim=2, axes='YXC', n_channel_in=1, n_channel_out=1, unet_residual=False, unet_n_depth=2, unet_kern_size=5, unet_n_first=32, unet_last_activation='linear', unet_input_shape=(None, None, 1), train_loss='mse', train_epochs=30, train_steps_per_epoch=200, train_learning_rate=0.0004, train_batch_size=16, train_tensorboard=True, train_checkpoint='weights_best.h5', train_reduce_lr={'factor': 0.5, 'patience': 10}, batch_norm=True, n2v_perc_pix=0.198, n2v_patch_shape=[64, 64], n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=True, blurpool=False, skip_skipone=False, structN2Vmask=None, probabilistic=False)
        self.config = N2VConfig(means=['46921.984'], stds=['16851.47'], n_dim=2, axes='YXC', n_channel_in=1, n_channel_out=1, unet_residual=False, unet_n_depth=2, unet_kern_size=5, unet_n_first=32, unet_last_activation='linear', unet_input_shape=(None, None, 1), train_loss='mse', train_epochs=30, train_steps_per_epoch=200, train_learning_rate=0.0004, train_batch_size=16, train_tensorboard=True, train_checkpoint='weights_best.h5', train_reduce_lr={'factor': 0.5, 'patience': 10}, batch_norm=True, n2v_perc_pix=0.198, n2v_patch_shape=[64, 64], n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=True, blurpool=False, skip_skipone=False, structN2Vmask=None, probabilistic=False)
        self.config.train_checkpoint = 'weights_best.h5'
        self.logdir = PosixPath('/home/xxx/.napari/N2V/models/n2v_2D')
    303                         save_weights_only=True))
    304     self.callbacks.append(
    305         ModelCheckpoint(str(self.logdir / 'weights_now.h5'), save_best_only=False, save_weights_only=True))
    307 if self.config.train_tensorboard:

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/keras/src/callbacks/model_checkpoint.py:183, in ModelCheckpoint.__init__(self=<keras.src.callbacks.model_checkpoint.ModelCheckpoint object>, filepath='/home/xxx/.napari/N2V/models/n2v_2D/weights_best.h5', monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=True, mode='auto', save_freq='epoch', initial_value_threshold=None)
    181 if save_weights_only:
    182     if not self.filepath.endswith(".weights.h5"):
--> 183         raise ValueError(
    184             "When using `save_weights_only=True` in `ModelCheckpoint`"
    185             ", the filepath provided must end in `.weights.h5` "
    186             "(Keras weights format). Received: "
    187             f"filepath={self.filepath}"
    188         )
    189 else:
    190     if not self.filepath.endswith(".keras"):

ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). Received: filepath=/home/xxx/.napari/N2V/models/n2v_2D/weights_best.h5

The text was updated successfully, but these errors were encountered:

bpavie · 2024-03-13T14:44:06Z

I am getting the same error while starting the training

jdeschamps · 2024-03-14T15:14:38Z

Hi,

Thanks for posting the issue. Which TF version do you use?

I foresee that it is a newer version than what the plugin has been tested with. This seems simple enough to fix. We will try to push a fix asap.

ptlzon · 2024-03-15T01:43:27Z

Thank for your response, @jdeschamps
That seems to be the problem.

My conda environment has

cudatoolkit   11.1.74      
cudnn                     8.0.4

and my pip environment has
tensorflow 2.16.1

Thanks a lot.

comatose-tortoise · 2024-03-22T15:13:07Z

Any progress on this?

jdeschamps · 2024-04-06T15:54:40Z

Hi,

So it is not just a quick fix, as it seems n2v (and csbdeep potentially), are not compatible with the newer TF (juglab/n2v#150). Since we are not actively developing n2v*, you will have to try to use an older version of TF... Which is always tricky and annoying, I went back to old TF page versions and copied the installation instructions in the readme of n2v.

* we are not developing it actively because we are working on its successor, which is PyTorch based. We will announce it on the n2v repo and here, and later this year we will update the napari plugin.

comatose-tortoise · 2024-04-06T19:57:29Z

we are not developing it actively because we are working on its successor, which is PyTorch based. We will announce it on the n2v repo and here, and later this year we will update the napari plugin.

Awesome! Will it have native support for Apple Silicon through the PyTorch-Metal project?

jdeschamps · 2024-04-08T09:50:43Z

We definitely hope so, but that does depend more on Apple and Facebook than on us. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). #55

ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). #55

ptlzon commented Mar 12, 2024 •

edited

Loading

bpavie commented Mar 13, 2024

jdeschamps commented Mar 14, 2024

ptlzon commented Mar 15, 2024 •

edited

Loading

comatose-tortoise commented Mar 22, 2024

jdeschamps commented Apr 6, 2024 •

edited

Loading

comatose-tortoise commented Apr 6, 2024

jdeschamps commented Apr 8, 2024

ValueError: When using save_weights_only=True in ModelCheckpoint, the filepath provided must end in .weights.h5 (Keras weights format). #55

ValueError: When using save_weights_only=True in ModelCheckpoint, the filepath provided must end in .weights.h5 (Keras weights format). #55

Comments

ptlzon commented Mar 12, 2024 • edited Loading

bpavie commented Mar 13, 2024

jdeschamps commented Mar 14, 2024

ptlzon commented Mar 15, 2024 • edited Loading

comatose-tortoise commented Mar 22, 2024

jdeschamps commented Apr 6, 2024 • edited Loading

comatose-tortoise commented Apr 6, 2024

jdeschamps commented Apr 8, 2024

ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). #55

ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). #55

ptlzon commented Mar 12, 2024 •

edited

Loading

ptlzon commented Mar 15, 2024 •

edited

Loading

jdeschamps commented Apr 6, 2024 •

edited

Loading