Remove _supports_static_cache = True for some model classes #34975

ydshieh · 2024-11-27T15:34:36Z

What does this PR do?

Remove _supports_static_cache = True for some model classes. See the comments in changes.

They were True before because it is set simply we can use static cache without torch.compile. But after #34247, static is kind tied to torch.compile and we should say it works if it works with torch.compile

ydshieh · 2024-11-27T15:36:37Z

src/transformers/models/granitemoe/modeling_granitemoe.py

@@ -330,6 +330,8 @@ def forward(self, hidden_states):
        )  # [num_tokens, num_experts]
        gates = zeros.scatter(1, top_k_indices, 1)  # [num_tokens, num_experts]
        expert_size = gates.long().sum(0)  # [num_experts,]
+        # (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: Backend compiler failed with a fake tensor exception at`)
+        # (and `DataDependentOutputException`)
        expert_size = expert_size.tolist()


jimba has this line expert_size = expert_size.tolist() too and it has no _supports_static_cache = True. Let do the same for this model.

ydshieh · 2024-11-27T15:38:28Z

src/transformers/models/granitemoe/modeling_granitemoe.py

-            if attention_mask.max() != 0:
-                raise ValueError("Custom 4D attention mask should be passed in inverted form with max==0`")


this is the only place I see an extra check attention_mask.max() != 0 within if attention_mask is not None and attention_mask.dim() == 4. Not sure if we really need it, but it gives another different error (different from what expert_size = expert_size.tolist() give above) if we use torch compile

cc @mayank31398 you may know better if this check is really necessary

it is not in granite modeling code though

ydshieh · 2024-11-27T15:40:55Z

src/transformers/models/idefics/modeling_idefics.py

@@ -1155,7 +1156,7 @@ def forward(
        elif position_ids is None:
            position_ids = cache_position.unsqueeze(0)

-        if (pixel_values, image_encoder_embeddings, perceiver_embeddings).count(None) != 2:


this will fail torch compile with another different type error.

HuggingFaceDocBuilderDev · 2024-11-27T16:12:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh added 8 commits November 27, 2024 15:38

try 1

f3dbc6f

try 1

c935878

try 1

853ed93

try 1

bb921ed

try 1

565a6b1

try 1

2facf26

try 1

b630024

try 1

7117566

ydshieh commented Nov 27, 2024

View reviewed changes

try 1

b371e98

ydshieh changed the title ~~Set some~~ Remove _supports_static_cache = True for some model classes Nov 27, 2024

ydshieh requested review from ArthurZucker November 27, 2024 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove _supports_static_cache = True for some model classes #34975

Remove _supports_static_cache = True for some model classes #34975

ydshieh commented Nov 27, 2024 •

edited

Loading

ydshieh Nov 27, 2024

ydshieh Nov 27, 2024

ydshieh Nov 27, 2024

ydshieh Nov 27, 2024

ydshieh Nov 27, 2024

HuggingFaceDocBuilderDev commented Nov 27, 2024

		if attention_mask.max() != 0:
		raise ValueError("Custom 4D attention mask should be passed in inverted form with max==0`")

Remove _supports_static_cache = True for some model classes #34975

Are you sure you want to change the base?

Remove _supports_static_cache = True for some model classes #34975

Conversation

ydshieh commented Nov 27, 2024 • edited Loading

What does this PR do?

ydshieh Nov 27, 2024

Choose a reason for hiding this comment

ydshieh Nov 27, 2024

Choose a reason for hiding this comment

ydshieh Nov 27, 2024

Choose a reason for hiding this comment

ydshieh Nov 27, 2024

Choose a reason for hiding this comment

ydshieh Nov 27, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 27, 2024

ydshieh commented Nov 27, 2024 •

edited

Loading