Can't Fit a Binary Classifier that Uses Gemma Pre-trained Model Embeddings #2102

rlcauvin · 2025-02-17T21:44:14Z

Describe the bug

Using the "gemma2_2b_en" Gemma pre-trained model in a neural network results in ValueError: Cannot get result() since the metric has not yet been built. during training.

To Reproduce

Stripped down example here: https://colab.research.google.com/drive/1r8XkaQBeUxP5fp9i1QLaikFIdbhcrKMw?usp=sharing

Expected behavior

It should be possible to use a Gemma pre-trained model as a neural network layer in a binary classifier and successfully train the model.

Additional context

This use of Gemma to generate embeddings for binary classification is based on this starting point by @jeffcarp.

Would you like to help us fix it?

No

The text was updated successfully, but these errors were encountered:

rlcauvin · 2025-02-18T14:25:11Z

The reason I think I've isolated a problem in the Gemma encoding layer is that training the classifier model works fine if I swap a keras.layers.TextVectorization layer in for it.

jeffcarp · 2025-02-18T22:41:37Z

I think the problem is related to instantiating a sub-model (keras_hub.models.GemmaCausalLM) within the context of another model. Can you try this?

self.gemma_lm = keras_hub.models.GemmaCausalLM.from_preset("gemma2_2b_en", compile=False)

This unblocks training locally for me but I am seeing a TPU error when trying the fix in your Colab, unsure if related.

rlcauvin · 2025-02-18T23:37:00Z

Thanks, @jeffcarp. The change did get past the original error. Maybe it's the TPU error to which you're referring?

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
<ipython-input-14-d7863a64cd24> in <cell line: 0>()
      2 y_train = keras.ops.array([[1], [0], [0]])
      3 
----> 4 nn_model_history = nn_model.fit(
      5   x = x_train,
      6   y = y_train,

1 frames
/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
    120             # To get the full stack trace, call:
    121             # `keras.config.disable_traceback_filtering()`
--> 122             raise e.with_traceback(filtered_tb) from None
    123         finally:
    124             del filtered_tb

/usr/local/lib/python3.11/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57       e.message += " name: " + name
     58     raise core._status_to_exception(e) from None
---> 59   except TypeError as e:
     60     keras_symbolic_tensors = [x for x in inputs if _is_keras_symbolic_tensor(x)]
     61     if keras_symbolic_tensors:

NotFoundError: Graph execution error:

Detected at node StatefulPartitionedCall defined at (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>

  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start

  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 205, in start

  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever

  File "/usr/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once

  File "/usr/lib/python3.11/asyncio/events.py", line 84, in _run

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 499, in process_one

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 2975, in run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3257, in run_cell_async

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code

  File "<ipython-input-14-d7863a64cd24>", line 4, in <cell line: 0>

  File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 371, in fit

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 219, in function

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 132, in multi_step_on_iterator

could not find registered transfer manager for platform Host -- check target linkage
	 [[{{node StatefulPartitionedCall}}]] [Op:__inference_multi_step_on_iterator_22334]

jeffcarp · 2025-02-19T22:28:07Z

I'm able to get training running on GPU, but the Colab instance doesn't have enough memory to load the full Gemma preset:
https://colab.research.google.com/drive/1NoMXBJV_RDH70rTK9ueSR7nRjibv8i2a?usp=sharing

The TPU error looks like it's related to a TF version mismatch:
https://www.kaggle.com/models/google/gemma/discussion/511235

rlcauvin · 2025-02-23T00:31:46Z

Is there a magic combination of package versions we can use to get it to run on the TPU?

Gopi-Uppari · 2025-02-26T06:30:08Z

Hi @rlcauvin,

I was able to reproduce the issue using your code while running it on a TPU in google colab. To fix this, try setting run_eagerly=True in model.compile(). You can also check out this gist file or reference.

Thank you.

rlcauvin · 2025-02-26T15:18:09Z

Thanks, @Gopi-Uppari. Using run_eagerly=True in model.compile() did, indeed, get past the graph execution error. I'm not sure why, but I also had to set compile=False when loading the preset:

self.gemma_lm = keras_hub.models.GemmaCausalLM.from_preset("gemma2_2b_en", compile = False)

Now I'll see what happens when I deploy the binary classifer to a TensorFlow serving endpoint.

github-actions bot added the Gemma Gemma model specific issues label Feb 17, 2025

github-actions bot assigned dhantule Feb 17, 2025

jeffcarp self-assigned this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't Fit a Binary Classifier that Uses Gemma Pre-trained Model Embeddings #2102

Can't Fit a Binary Classifier that Uses Gemma Pre-trained Model Embeddings #2102

rlcauvin commented Feb 17, 2025 •

edited

Loading

rlcauvin commented Feb 18, 2025

jeffcarp commented Feb 18, 2025

rlcauvin commented Feb 18, 2025

jeffcarp commented Feb 19, 2025

rlcauvin commented Feb 23, 2025

Gopi-Uppari commented Feb 26, 2025 •

edited

Loading

rlcauvin commented Feb 26, 2025

Can't Fit a Binary Classifier that Uses Gemma Pre-trained Model Embeddings #2102

Can't Fit a Binary Classifier that Uses Gemma Pre-trained Model Embeddings #2102

Comments

rlcauvin commented Feb 17, 2025 • edited Loading

rlcauvin commented Feb 18, 2025

jeffcarp commented Feb 18, 2025

rlcauvin commented Feb 18, 2025

jeffcarp commented Feb 19, 2025

rlcauvin commented Feb 23, 2025

Gopi-Uppari commented Feb 26, 2025 • edited Loading

rlcauvin commented Feb 26, 2025

rlcauvin commented Feb 17, 2025 •

edited

Loading

Gopi-Uppari commented Feb 26, 2025 •

edited

Loading