Add Zamba2 #34517

pglorio · 2024-10-30T17:57:31Z

What does this PR do?

Please include support for Zamba2 architecture created by Zyphra Technologies.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

…into zamba2

Rebase zamba2

rebase

Rebase

pglorio · 2024-11-11T06:50:35Z

Hey @Arthur,

Thank you again for your help in getting Zamba2 into transformers! The PR is now finally ready to be reviewed. I added the documentation and all unit tests pass, including slow tests.

A few remarks, mostly related to modular transformers:

To generate modeling and configuration I used utils/modular_model_converter.py from a previous commit because the most recent version of this script that followed from a large refactoring produces an error that I was not able to fix:

Converting src/transformers/models/zamba2/modular_zamba2.py to a single model single file format
Traceback (most recent call last):
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1510, in <module>
    converted_files = convert_modular_file(file_name, args.old_model_name, args.new_model_name)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1447, in convert_modular_file
    for file, module in create_modules(cst_transformers).items():
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1387, in create_modules
    nodes_to_add, file_type, new_imports = get_class_node_and_dependencies(modular_mapper, class_name, node, files)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1337, in get_class_node_and_dependencies
    new_node_dependencies, new_imports = check_dependencies_and_create_import_node(
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in check_dependencies_and_create_import_node
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in <setcomp>
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
KeyError: 'Zamba2Config'

I carefully compared Zamba2Config with classes of other models that also use modular (such as Gemma2Config) and they appear to have consistent format. Relatedly, the utils/modular_model_converter.py in the current PR (path) is the version from the previous commit mentioned above.

After running utils/modular_model_converter.py, the modeling and configuration files generated contain unintended code that I had to update. All these modifications are in this commit. In particular, the produced modeling file contains Zamba2DynamicCache, which is the correct cache of Zamba2 as well as HybridMambaAttentionDynamicCache, which is the cache of Zamba and is not relevant to Zamba2, so I deleted HybridMambaAttentionDynamicCache and related references.
I ran make fixup and all zamba-related tests pass, with the exception of python utils/check_modular_conversion.py. This test doesn't pass due to the modifications mentioned in the previous point.
I slightly edited the Zamba2MambaMixer compared to the original Mamba2Mixer of mamba2, the main difference is that I added these lines, which was necessary to appropriately process the mamba2 cache (note this step already existed in the torch forward in these lines).

Looking forward to your feedback. Thanks so much!

src/transformers/models/zamba2/modular_zamba2.py

rebase on upstream

pglorio · 2025-01-17T05:09:39Z

Hello @Cyrilvallez, I ran all model tests on two GPUs and after a couple of minor fixes everything appears to work now. I'm skipping this test as it gives an error related to mamba2 kernels. I indeed verified that mamba2 skips that test here.

Separately, when running utils/check_modular_conversion.py I get the following error:

Differences found between the generated code and src/transformers/models/zamba2/modeling_zamba2.py:

   1 --- src/transformers/models/zamba2/modeling_zamba2.py_generated
   2 +++ src/transformers/models/zamba2/modeling_zamba2.py
   3 @@ -313,6 +313,13 @@
   4      return attn_output, attn_weights
   5  
   6  
   7 +def rotate_half(x):
   8 +    """Rotates half the hidden dims of the input."""
   9 +    x1 = x[..., : x.shape[-1] // 2]
  10 +    x2 = x[..., x.shape[-1] // 2 :]
  11 +    return torch.cat((-x2, x1), dim=-1)
  12 +
  13 +
  14  def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
  15      """Applies Rotary Position Embedding to the query and key tensors.
  16  
  17 @@ -338,13 +345,6 @@
  18      q_embed = (q * cos) + (rotate_half(q) * sin)
  19      k_embed = (k * cos) + (rotate_half(k) * sin)
  20      return q_embed, k_embed
  21 -
  22 -
  23 -def rotate_half(x):
  24 -    """Rotates half the hidden dims of the input."""
  25 -    x1 = x[..., : x.shape[-1] // 2]
  26 -    x2 = x[..., x.shape[-1] // 2 :]
  27 -    return torch.cat((-x2, x1), dim=-1)

which I was not getting before despite this part was identical.

Cyrilvallez

LGTM! Let's just wait for #35795 which will get rid of the CI failure for modular conversion! Sorry about that, and thanks for being so patient with us 🙏🙏🤗
Great work!

pglorio · 2025-01-21T08:27:03Z

Awesome, sounds good!

ArthurZucker

Thanks! A few comments about the code paths, regex init and should be good!

docs/source/en/model_doc/zamba2.md

src/transformers/models/zamba2/modular_zamba2.py

ArthurZucker · 2025-01-21T09:34:23Z

src/transformers/models/zamba2/modular_zamba2.py

+                        "shared_transformer.pre_ff_layernorm.weight",
+                    ]
+                    self._tied_weights_keys = [*self._tied_weights_keys, *[prefix_name + key for key in tied_keys]]
+                    if self.config.use_shared_mlp_adapter:


same comment about code path, which models have this set to true / false?

tied key supports regex patter, we should never have to add all of themmanually like this

all checkpoints have config.use_shared_mlp_adapter set to True. We have internal checkpoints with this flag set to False, which might be released in the future.

I'd rather we add a new model when they are release than having 2 code pahts 😉 it's two different models for us!

replaced tied keys with regex patterns here

I'd rather we add a new model when they are release than having 2 code pahts 😉 it's two different models for us!

sounds good, got rid of config.use_shared_mlp_adapter here.

ArthurZucker · 2025-01-21T09:35:47Z

tests/models/zamba2/test_modeling_zamba2.py

+            , dtype=torch.float32)  # fmt: skip
+
+        torch.testing.assert_close(logits[0, -1, :40].cpu(), EXPECTED_LOGITS_NO_GRAD_0, rtol=1e-3, atol=1e-3)
+        torch.testing.assert_close(logits[1, -1, :40].cpu(), EXPECTED_LOGITS_NO_GRAD_1, rtol=1e-3, atol=1e-3)


It's missing a test on cpu with the sow forward!

could you please say more about this?

done here for both test_simple_generate and test_simple_batched_generate_with_padding.

test_simple_generate passes the cpu test straightforwardly. test_simple_batched_generate_with_padding marginally doesn't pass on cpu for one of the two output logits in the batch (disagreement on 2 out of 40 logits):

torch.testing.assert_close(logits[0, -1, :40].cpu(), EXPECTED_LOGITS_NO_GRAD_0, rtol=1e-3, atol=1e-3) > torch.testing.assert_close(logits[1, -1, :40].cpu(), EXPECTED_LOGITS_NO_GRAD_1, rtol=1e-3, atol=1e-3) E AssertionError: Tensor-likes are not close! E E Mismatched elements: 2 / 40 (5.0%) E Greatest absolute difference: 0.009563922882080078 at index (12,) (up to 0.001 allowed) E Greatest relative difference: 0.030748309567570686 at index (0,) (up to 0.001 allowed)

Given this is a 1.2B parameter model, it's not so surprising to find occasional small discrepancies in the forward pass when running a model of this size on CPU instead of GPU. I updated the value of the absolute tolerance to a new value when the test is run on CPU here: atol=1e-3 -> atol=6e-3 if torch_device == "cpu" else 1e-3.

yep sounds good thanks a lot for checking this

Co-authored-by: Arthur <[email protected]>

This reverts commit 9007a52.

pglorio · 2025-01-24T03:15:58Z

Thank you @ArthurZucker! I think all your comments have been addressed. All zamba-related tests appear to pass!

ArthurZucker

Thanks only a small comment left regarding code paths and good to go !

docs/source/en/model_doc/zamba2.md

ArthurZucker · 2025-01-24T10:42:56Z

src/transformers/models/zamba2/modular_zamba2.py

+        query_states = self.q_proj(hidden_states)
+        key_states = self.k_proj(hidden_states)
+        value_states = self.v_proj(hidden_states)
+        if self.config.use_shared_attention_adapter:


I don'tknow if I asked already, but similarly is this true / false for the released checkpoint?

this is true for some of the released checkpoints and false for other released checkpoints

ArthurZucker · 2025-01-24T10:43:13Z

src/transformers/models/zamba2/modular_zamba2.py

+        key_states = key_states.view(hidden_shape).transpose(1, 2)
+        value_states = value_states.view(hidden_shape).transpose(1, 2)
+
+        if self.config.use_mem_rope:


same comment, let's weed out the final bits that are not part of the released checkpoint!

this is also true for part of the released checkpoints and false for others

ArthurZucker · 2025-01-24T10:43:48Z

src/transformers/models/zamba2/modular_zamba2.py

+        if config.use_mem_rope:
+            if config.use_long_context:
+                logger.warning_once(
+                    "`use_long_context` set to `True`: using rescaled `rope_theta` and extended `max_position_embeddings`."
+                )


same comment here!

same for this, this is a flag that changes theta and increases model's performance at long-context tasks and is specific to the 7B checkpoint, so it can be both true and false depending on the checkpoint

Co-authored-by: Arthur <[email protected]>

pglorio · 2025-01-24T17:59:32Z

Thanks @ArthurZucker, I replied to your comments above.

pglorio added 6 commits October 24, 2024 05:33

First commit

acd25b7

Finish model implementation

70639b8

First commit

d111b98

Finish model implementation

8f36dba

Merge branch 'zamba2' of https://github.com/Zyphra/transformers_zamba …

f0c547c

…into zamba2

Register zamba2

700fbf0

pglorio marked this pull request as draft October 30, 2024 17:57

pglorio and others added 17 commits November 4, 2024 23:57

generated modeling and configuration

70a6021

Merge pull request #2 from Zyphra/main

88c4b26

Rebase zamba2

generated modeling and configuration

685906a

added hybrid cache

4da8d5f

fix attention_mask in mamba

6b5a9be

dropped unused loras

248350d

fix flash2

d1d2c66

Merge pull request #3 from Zyphra/main

eb6063e

rebase

config docstrings

5f5d01e

fix config and fwd pass

c1b7647

make fixup fixes

979b99b

text_modeling_zamba2

9d9b2eb

Merge pull request #4 from Zyphra/main

3a457f5

Rebase

small fixes

549d4cb

make fixup fixes

987bba9

Merge pull request #5 from Zyphra/main

ffc2a58

Rebase

Fix modular model converter

9adf85e

ArthurZucker reviewed Nov 14, 2024

View reviewed changes

src/transformers/models/zamba2/modular_zamba2.py Show resolved Hide resolved

added inheritances in modular, renamed zamba cache

904da4e

pglorio force-pushed the zamba2 branch from 6d20bf9 to 904da4e Compare November 19, 2024 06:28

pglorio and others added 2 commits November 19, 2024 01:06

Merge pull request #6 from Zyphra/main

4725983

rebase on upstream

modular rebase

0be27d7

pglorio added 4 commits January 16, 2025 20:11

fixed rope_kwargs

99bde93

Instantiate cache in Zamba2Model

baf2ed3

fix cache

9afb57e

fix @slow decorator

d1687f9

rebase

4299889

Cyrilvallez approved these changes Jan 20, 2025

View reviewed changes

pglorio added 2 commits January 21, 2025 08:42

rebase

a0545bf

small fix in modular file

903f6dc

ArthurZucker reviewed Jan 21, 2025

View reviewed changes

pglorio and others added 11 commits January 22, 2025 22:41

Update docs/source/en/model_doc/zamba2.md

14396d7

Co-authored-by: Arthur <[email protected]>

several minor fixes

02f5807

inherit mamba2decoder fwd and drop position_ids in mamba

bfb0267

removed docstrings from modular

b222943

rebase

b114ad8

reinstate zamba2 attention decoder fwd

929ee67

use regex for tied keys

9007a52

Revert "use regex for tied keys"

f701dbd

This reverts commit 9007a52.

use regex for tied keys

87b938b

add cpu to slow forward tests

5e09290

dropped config.use_shared_mlp_adapter

8ed2353

ArthurZucker approved these changes Jan 24, 2025

View reviewed changes

Update docs/source/en/model_doc/zamba2.md

a9bbd9c

Co-authored-by: Arthur <[email protected]>

pglorio added 2 commits January 27, 2025 06:19

rebase

1e82757

re-convert from modular

37bff34

ArthurZucker marked this pull request as ready for review January 27, 2025 09:26

ArthurZucker merged commit 33cb1f7 into huggingface:main Jan 27, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zamba2 #34517

Add Zamba2 #34517

pglorio commented Oct 30, 2024

pglorio commented Nov 11, 2024

pglorio commented Jan 17, 2025

Cyrilvallez left a comment •

edited

Loading

pglorio commented Jan 21, 2025

ArthurZucker left a comment

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

pglorio Jan 23, 2025

ArthurZucker Jan 23, 2025

pglorio Jan 24, 2025

pglorio Jan 24, 2025

ArthurZucker Jan 21, 2025

pglorio Jan 23, 2025

pglorio Jan 24, 2025

ArthurZucker Jan 24, 2025

pglorio commented Jan 24, 2025

ArthurZucker left a comment

ArthurZucker Jan 24, 2025

pglorio Jan 24, 2025

ArthurZucker Jan 24, 2025

pglorio Jan 24, 2025

ArthurZucker Jan 24, 2025

pglorio Jan 24, 2025 •

edited

Loading

pglorio commented Jan 24, 2025

Add Zamba2 #34517

Add Zamba2 #34517

Conversation

pglorio commented Oct 30, 2024

What does this PR do?

Who can review?

pglorio commented Nov 11, 2024

pglorio commented Jan 17, 2025

Cyrilvallez left a comment • edited Loading

Choose a reason for hiding this comment

pglorio commented Jan 21, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pglorio commented Jan 24, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pglorio Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

pglorio commented Jan 24, 2025

Cyrilvallez left a comment •

edited

Loading

pglorio Jan 24, 2025 •

edited

Loading