Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: When using save_weights_only=True in ModelCheckpoint, the filepath provided must end in .weights.h5 (Keras weights format). #55

Open
ptlzon opened this issue Mar 12, 2024 · 7 comments

Comments

@ptlzon
Copy link

ptlzon commented Mar 12, 2024

When trying to train the demo file, I got this error.

(napari-gpu-py39) xxx@hpcg01:~$ napari
2024-03-12 13:35:31.656562: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-12 13:35:31.657001: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-12 13:35:31.660024: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-12 13:35:31.697270: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-12 13:35:32.441568: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-12 13:35:33.010855: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO: Downloading data can take a few minutes.
INFO: Loading data
INFO: Shaping data
Generated patches: (392, 64, 64, 1)
Train patches: (387, 64, 64, 1)
Val patches: (5, 64, 64, 1)
INFO: Creating model
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/superqt/utils/_qthreading.py:617, in create_worker.<locals>.reraise(e=ValueError('When using `save_weights_only=True` ...36650/.napari/N2V/models/n2v_2D/weights_best.h5'))
    616 def reraise(e):
--> 617     raise e
        e = ValueError('When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). Received: filepath=/home/xxx/.napari/N2V/models/n2v_2D/weights_best.h5')

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/superqt/utils/_qthreading.py:178, in WorkerBase.run(self=<napari._qt.qthreading.GeneratorWorker object>)
    176     warnings.filterwarnings("always")
    177     warnings.showwarning = lambda *w: self.warned.emit(w)
--> 178     result = self.work()
        self = <napari._qt.qthreading.GeneratorWorker object at 0x7f272362d550>
    179 if isinstance(result, Exception):
    180     if isinstance(result, RuntimeError):
    181         # The Worker object has likely been deleted.
    182         # A deleted wrapped C/C++ object may result in a runtime
    183         # error that will cause segfault if we try to do much other
    184         # than simply notify the user.

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/superqt/utils/_qthreading.py:444, in GeneratorWorker.work(self=<napari._qt.qthreading.GeneratorWorker object>)
    442 try:
    443     _input = self._next_value()
--> 444     output = self._gen.send(_input)
        self = <napari._qt.qthreading.GeneratorWorker object at 0x7f272362d550>
        _input = None
        self._gen = <generator object train_worker at 0x7f272362e040>
    445     self.yielded.emit(output)
    446 except StopIteration as exc:

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/napari_n2v/utils/training_worker.py:111, in train_worker(widget=<napari_n2v._train_widget.TrainWidget object>, pretrained_model=None, expert_settings=None)
    108 widget.weights_path = Path(base_dir, model_name, 'weights_best.h5').absolute()
    110 try:
--> 111     model = create_model(X_train,
        X_train = <class 'numpy.ndarray'> (387, 64, 64, 1) float32
        n_epochs = 30
        n_steps = 200
        batch_size = 16
        model_name = 'n2v_2D'
        base_dir = PosixPath('models')
        updater = <napari_n2v.utils.training_worker.Updater object at 0x7f2723633190>
        expert_settings = None
    112                          n_epochs,
    113                          n_steps,
    114                          batch_size,
    115                          model_name,
    116                          base_dir.absolute(),
    117                          updater,
    118                          expert_settings=expert_settings)
    119 except InternalError as e:
    120     print(e.message)

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/napari_n2v/utils/n2v_utils.py:181, in create_model(X_patches=<class 'numpy.ndarray'> (387, 64, 64, 1) float32, n_epochs=30, n_steps=200, batch_size=16, model_name='n2v_2D', basedir=PosixPath('/home/xxx/.napari/N2V/models'), updater=<napari_n2v.utils.training_worker.Updater object>, expert_settings=None, train=True)
    178 model = N2V(config, model_name, basedir=basedir)
    180 if train:
--> 181     model.prepare_for_training(metrics={})
        model = N2V(n2v_2D): YXC → YXC
├─ Directory: /home/xxx/.napari/N2V/models/n2v_2D
└─ N2VConfig(means=['46921.984'], stds=['16851.47'], n_dim=2, axes='YXC', n_channel_in=1, n_channel_out=1, unet_residual=False, unet_n_depth=2, unet_kern_size=5, unet_n_first=32, unet_last_activation='linear', unet_input_shape=(None, None, 1), train_loss='mse', train_epochs=30, train_steps_per_epoch=200, train_learning_rate=0.0004, train_batch_size=16, train_tensorboard=True, train_checkpoint='weights_best.h5', train_reduce_lr={'factor': 0.5, 'patience': 10}, batch_norm=True, n2v_perc_pix=0.198, n2v_patch_shape=[64, 64], n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=True, blurpool=False, skip_skipone=False, structN2Vmask=None, probabilistic=False)
    183 # add updater
    184 if updater:

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/n2v/models/n2v_standard.py:302, in N2V.prepare_for_training(self=N2V(n2v_2D): YXC → YXC
├─ Directory: /home/xxx6...e=False, structN2Vmask=None, probabilistic=False), optimizer=<keras.src.optimizers.adam.Adam object>, **kwargs={'metrics': {}})
    299 if self.config.train_checkpoint is not None:
    300     from tensorflow.keras.callbacks import ModelCheckpoint
    301     self.callbacks.append(
--> 302         ModelCheckpoint(str(self.logdir / self.config.train_checkpoint), save_best_only=True,
        self.callbacks = [<keras.src.callbacks.terminate_on_nan.TerminateOnNaN object at 0x7f28b85a2df0>]
        self = N2V(n2v_2D): YXC → YXC
├─ Directory: /home/xxx/.napari/N2V/models/n2v_2D
└─ N2VConfig(means=['46921.984'], stds=['16851.47'], n_dim=2, axes='YXC', n_channel_in=1, n_channel_out=1, unet_residual=False, unet_n_depth=2, unet_kern_size=5, unet_n_first=32, unet_last_activation='linear', unet_input_shape=(None, None, 1), train_loss='mse', train_epochs=30, train_steps_per_epoch=200, train_learning_rate=0.0004, train_batch_size=16, train_tensorboard=True, train_checkpoint='weights_best.h5', train_reduce_lr={'factor': 0.5, 'patience': 10}, batch_norm=True, n2v_perc_pix=0.198, n2v_patch_shape=[64, 64], n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=True, blurpool=False, skip_skipone=False, structN2Vmask=None, probabilistic=False)
        self.config = N2VConfig(means=['46921.984'], stds=['16851.47'], n_dim=2, axes='YXC', n_channel_in=1, n_channel_out=1, unet_residual=False, unet_n_depth=2, unet_kern_size=5, unet_n_first=32, unet_last_activation='linear', unet_input_shape=(None, None, 1), train_loss='mse', train_epochs=30, train_steps_per_epoch=200, train_learning_rate=0.0004, train_batch_size=16, train_tensorboard=True, train_checkpoint='weights_best.h5', train_reduce_lr={'factor': 0.5, 'patience': 10}, batch_norm=True, n2v_perc_pix=0.198, n2v_patch_shape=[64, 64], n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=True, blurpool=False, skip_skipone=False, structN2Vmask=None, probabilistic=False)
        self.config.train_checkpoint = 'weights_best.h5'
        self.logdir = PosixPath('/home/xxx/.napari/N2V/models/n2v_2D')
    303                         save_weights_only=True))
    304     self.callbacks.append(
    305         ModelCheckpoint(str(self.logdir / 'weights_now.h5'), save_best_only=False, save_weights_only=True))
    307 if self.config.train_tensorboard:

File ~/miniconda3/envs/napari-gpu-py39/lib/python3.9/site-packages/keras/src/callbacks/model_checkpoint.py:183, in ModelCheckpoint.__init__(self=<keras.src.callbacks.model_checkpoint.ModelCheckpoint object>, filepath='/home/xxx/.napari/N2V/models/n2v_2D/weights_best.h5', monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=True, mode='auto', save_freq='epoch', initial_value_threshold=None)
    181 if save_weights_only:
    182     if not self.filepath.endswith(".weights.h5"):
--> 183         raise ValueError(
    184             "When using `save_weights_only=True` in `ModelCheckpoint`"
    185             ", the filepath provided must end in `.weights.h5` "
    186             "(Keras weights format). Received: "
    187             f"filepath={self.filepath}"
    188         )
    189 else:
    190     if not self.filepath.endswith(".keras"):

ValueError: When using `save_weights_only=True` in `ModelCheckpoint`, the filepath provided must end in `.weights.h5` (Keras weights format). Received: filepath=/home/xxx/.napari/N2V/models/n2v_2D/weights_best.h5
@bpavie
Copy link

bpavie commented Mar 13, 2024

I am getting the same error while starting the training

@jdeschamps
Copy link
Member

Hi,

Thanks for posting the issue. Which TF version do you use?

I foresee that it is a newer version than what the plugin has been tested with. This seems simple enough to fix. We will try to push a fix asap.

@ptlzon
Copy link
Author

ptlzon commented Mar 15, 2024

Thank for your response, @jdeschamps
That seems to be the problem.

My conda environment has

cudatoolkit   11.1.74      
cudnn                     8.0.4   

and my pip environment has
tensorflow 2.16.1

Thanks a lot.

@comatose-tortoise
Copy link

Any progress on this?

@jdeschamps
Copy link
Member

jdeschamps commented Apr 6, 2024

Hi,

So it is not just a quick fix, as it seems n2v (and csbdeep potentially), are not compatible with the newer TF (juglab/n2v#150). Since we are not actively developing n2v*, you will have to try to use an older version of TF... Which is always tricky and annoying, I went back to old TF page versions and copied the installation instructions in the readme of n2v.

* we are not developing it actively because we are working on its successor, which is PyTorch based. We will announce it on the n2v repo and here, and later this year we will update the napari plugin.

@comatose-tortoise
Copy link

  • we are not developing it actively because we are working on its successor, which is PyTorch based. We will announce it on the n2v repo and here, and later this year we will update the napari plugin.

Awesome! Will it have native support for Apple Silicon through the PyTorch-Metal project?

@jdeschamps
Copy link
Member

We definitely hope so, but that does depend more on Apple and Facebook than on us. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants