Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFViTModel and interpolate_pos_encoding=True #36155

Open
2 of 4 tasks
carlosg-m opened this issue Feb 13, 2025 · 0 comments
Open
2 of 4 tasks

TFViTModel and interpolate_pos_encoding=True #36155

carlosg-m opened this issue Feb 13, 2025 · 0 comments
Labels
bug TensorFlow Anything TensorFlow

Comments

@carlosg-m
Copy link

carlosg-m commented Feb 13, 2025

System Info

  • transformers version: 4.48.3
  • Platform: Linux-5.15.0-1078-azure-x86_64-with-glibc2.35
  • Python version: 3.11.0rc1
  • Huggingface_hub version: 0.27.1
  • Safetensors version: 0.4.2
  • Accelerate version: 0.31.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): 2.16.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: NO
  • Using GPU in script?: YES
  • GPU type: Tesla V100-PCIE-16GB

Who can help?

@amyeroberts, @qubvel, @gante, @Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This simple script is used to create a Keras Model based on Vision Transformer TFViTModel.
I want to use higher resolution images than the default value of 224, as described in the documentation.
Enabling interpolate_pos_encoding=True returns an error during fit.
Using the default resolution and interpolate_pos_encoding=False makes the script work.

from transformers import ViTConfig, TFViTModel

config = ViTConfig(image_size=512)
base_model = TFViTModel(config).from_pretrained('google/vit-base-patch16-224')

inputs = tf.keras.Input((3, 512, 512), dtype='float32')
x = base_model.vit(inputs, interpolate_pos_encoding=True, training=True).pooler_output
output= tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.Model(inputs=[inputs], outputs=[output])

Error code:

OperatorNotAllowedInGraphError: in user code:

    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 1398, in train_function  *
        return step_function(self, iterator)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 1370, in run_step  *
        outputs = model.train_step(data)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 1147, in train_step  *
        y_pred = self(x, training=True)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 565, in error_handler  *
        del filtered_tb
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 588, in __call__  *
        return super().__call__(*args, **kwargs)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 565, in error_handler  *
        del filtered_tb
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/base_layer.py", line 1136, in __call__  *
        outputs = call_fn(inputs, *args, **kwargs)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/functional.py", line 514, in call  *
        return self._run_internal_graph(inputs, training=training, mask=mask)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/functional.py", line 671, in _run_internal_graph  *
        outputs = node.layer(*args, **kwargs)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 560, in error_handler  *
        filtered_tb = _process_traceback_frames(e.__traceback__)
    File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/base_layer.py", line 1136, in __call__  *
        outputs = call_fn(inputs, *args, **kwargs)
    File "/tmp/__autograph_generated_filepnn_cad_.py", line 162, in error_handler  **
        raise ag__.converted_call(ag__.ld(new_e).with_traceback, (ag__.ld(e).__traceback__,), None, fscope_1) from None
    File "/tmp/__autograph_generated_filepnn_cad_.py", line 34, in error_handler
        retval__1 = ag__.converted_call(ag__.ld(fn), tuple(ag__.ld(args)), dict(**ag__.ld(kwargs)), fscope_1)

    OperatorNotAllowedInGraphError: Exception encountered when calling layer 'vit' (type TFViTMainLayer).
    
    in user code:
    
        File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/transformers/modeling_tf_utils.py", line 598, in run_call_with_unpacked_inputs  *
            return func(self, **unpacked_inputs)
        File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/transformers/models/vit/modeling_tf_vit.py", line 595, in call  *
            embedding_output = self.embeddings(
        File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/training.py", line 560, in error_handler  *
            filtered_tb = _process_traceback_frames(e.__traceback__)
        File "/databricks/python/lib/python3.11/site-packages/tf_keras/src/engine/base_layer.py", line 1136, in __call__  *
            outputs = call_fn(inputs, *args, **kwargs)
        File "/tmp/__autograph_generated_filepnn_cad_.py", line 162, in error_handler  **
            raise ag__.converted_call(ag__.ld(new_e).with_traceback, (ag__.ld(e).__traceback__,), None, fscope_1) from None
        File "/tmp/__autograph_generated_filepnn_cad_.py", line 34, in error_handler
            retval__1 = ag__.converted_call(ag__.ld(fn), tuple(ag__.ld(args)), dict(**ag__.ld(kwargs)), fscope_1)
    
        OperatorNotAllowedInGraphError: Exception encountered when calling layer 'embeddings' (type TFViTEmbeddings).
        
        in user code:
        
            File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/transformers/models/vit/modeling_tf_vit.py", line 128, in call  *
                batch_size, num_channels, height, width = shape_list(pixel_values)
        
            OperatorNotAllowedInGraphError: Iterating over a symbolic `tf.Tensor` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.
        
        
        Call arguments received by layer 'embeddings' (type TFViTEmbeddings):
          • pixel_values=tf.Tensor(shape=<unknown>, dtype=float32)
          • interpolate_pos_encoding=True
          • training=True
    
    
    Call arguments received by layer 'vit' (type TFViTMainLayer):
      • pixel_values=tf.Tensor(shape=<unknown>, dtype=float32)
      • head_mask=None
      • output_attentions=None
      • output_hidden_states=None
      • interpolate_pos_encoding=True
      • return_dict=None
      • training=True
File <command-6957984842183233>, line 18
      8 #base_model.trainable = False
     10 model.compile(optimizer=tf.keras.optimizers.AdamW(learning_rate=1e-3, weight_decay=1e-6), 
     11               loss={'output_qualidade': tf.keras.losses.BinaryCrossentropy(label_smoothing=0.1),
     12                     'output_armario': tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1),
   (...)
     15                        'output_armario': tf.keras.metrics.AUC(curve='PR', multi_label=True, name='auc'),
     16                        'output_dano': tf.keras.metrics.AUC(curve='PR', multi_label=True, name='auc')})
---> 18 train_history = model.fit(x=train_generator, 
     19                           epochs=110,
     20                           validation_data=val_generator,
     21                           validation_freq=1,
     22                           callbacks=[merge_metrics, early_stoping],
     23                           verbose=2)
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:578, in safe_patch.<locals>.safe_patch_function(*args, **kwargs)
    568 try_log_autologging_event(
    569     AutologgingEventLogger.get_logger().log_patch_function_start,
    570     session,
   (...)
    574     kwargs,
    575 )
    577 if patch_is_class:
--> 578     patch_function.call(call_original, *args, **kwargs)
    579 else:
    580     patch_function(call_original, *args, **kwargs)
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:165, in PatchFunction.call(cls, original, *args, **kwargs)
    163 @classmethod
    164 def call(cls, original, *args, **kwargs):
--> 165     return cls().__call__(original, *args, **kwargs)
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:176, in PatchFunction.__call__(self, original, *args, **kwargs)
    172     self._on_exception(e)
    173 finally:
    174     # Regardless of what happens during the `_on_exception` callback, reraise
    175     # the original implementation exception once the callback completes
--> 176     raise e
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:169, in PatchFunction.__call__(self, original, *args, **kwargs)
    167 def __call__(self, original, *args, **kwargs):
    168     try:
--> 169         return self._patch_implementation(original, *args, **kwargs)
    170     except (Exception, KeyboardInterrupt) as e:
    171         try:
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:227, in with_managed_run.<locals>.PatchWithManagedRun._patch_implementation(self, original, *args, **kwargs)
    224 if not mlflow.active_run():
    225     self.managed_run = create_managed_run()
--> 227 result = super()._patch_implementation(original, *args, **kwargs)
    229 if self.managed_run:
    230     mlflow.end_run(RunStatus.to_string(RunStatus.FINISHED))
File /databricks/python/lib/python3.11/site-packages/mlflow/tensorflow/__init__.py:1334, in autolog.<locals>.FitPatch._patch_implementation(self, original, inst, *args, **kwargs)
   1327     except Exception as e:
   1328         _logger.warning(
   1329             "Failed to log training dataset information to "
   1330             "MLflow Tracking. Reason: %s",
   1331             e,
   1332         )
-> 1334 history = original(inst, *args, **kwargs)
   1336 if log_models:
   1337     _log_keras_model(history, args)
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:561, in safe_patch.<locals>.safe_patch_function.<locals>.call_original(*og_args, **og_kwargs)
    558         original_result = original(*_og_args, **_og_kwargs)
    559         return original_result
--> 561 return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs)
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:496, in safe_patch.<locals>.safe_patch_function.<locals>.call_original_fn_with_event_logging(original_fn, og_args, og_kwargs)
    487 try:
    488     try_log_autologging_event(
    489         AutologgingEventLogger.get_logger().log_original_function_start,
    490         session,
   (...)
    494         og_kwargs,
    495     )
--> 496     original_fn_result = original_fn(*og_args, **og_kwargs)
    498     try_log_autologging_event(
    499         AutologgingEventLogger.get_logger().log_original_function_success,
    500         session,
   (...)
    504         og_kwargs,
    505     )
    506     return original_fn_result
File /databricks/python/lib/python3.11/site-packages/mlflow/utils/autologging_utils/safety.py:558, in safe_patch.<locals>.safe_patch_function.<locals>.call_original.<locals>._original_fn(*_og_args, **_og_kwargs)
    550 # Show all non-MLflow warnings as normal (i.e. not as event logs)
    551 # during original function execution, even if silent mode is enabled
    552 # (`silent=True`), since these warnings originate from the ML framework
    553 # or one of its dependencies and are likely relevant to the caller
    554 with set_non_mlflow_warnings_behavior_for_current_thread(
    555     disable_warnings=False,
    556     reroute_warnings=False,
    557 ):
--> 558     original_result = original(*_og_args, **_og_kwargs)
    559     return original_result
File /databricks/python/lib/python3.11/site-packages/tf_keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb
File /databricks/python/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/autograph_util.py:52, in py_func_from_autograph.<locals>.autograph_handler(*args, **kwargs)
     50 except Exception as e:  # pylint:disable=broad-except
     51   if hasattr(e, "ag_error_metadata"):
---> 52     raise e.ag_error_metadata.to_exception(e)
     53   else:
     54     raise

Expected behavior

Expected behavior consists in running the model fit/training.

@carlosg-m carlosg-m added the bug label Feb 13, 2025
@Rocketknight1 Rocketknight1 added the TensorFlow Anything TensorFlow label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug TensorFlow Anything TensorFlow
Projects
None yet
Development

No branches or pull requests

2 participants