Fix model kwargs #35875

muellerzr · 2025-01-24T13:08:25Z

What does this PR do?

Adds unused **kwargs to particular models so that num_items_in_batch can work as intended

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

HuggingFaceDocBuilderDev · 2025-01-24T14:40:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-01-24T14:43:57Z

test_modeling_names.txt

ArthurZucker · 2025-01-24T14:44:30Z

src/transformers/models/mpt/modeling_mpt.py

+            loss = self.loss_function(
+                shift_logits.view(batch_size * seq_length, vocab_size),
+                shift_labels.view(batch_size * seq_length),
+                vocab_size=vocab_size,
+                **kwargs,
            )


bit weird, the refactor here should make you only have to pass inputs and the shifts will happen inside

Bachstelze · 2025-01-29T15:40:09Z

Is it normal that some checks were not successful?

ArthurZucker

Nice thanks!

ArthurZucker · 2025-01-30T14:20:14Z

test_modeling_names.txt

muellerzr · 2025-02-05T18:35:58Z

src/transformers/modeling_utils.py

+        if hasattr(self, "_loss_function"):
+            return self._loss_function
+


@ArthurZucker this was needed to be added for a few models that don't need everything the loss func was up to. Case was xglm

muellerzr · 2025-02-05T21:41:12Z

Finally ready to go, sorry it took me a bit, lots of models to triple check 😓

ArthurZucker

Taking this comment into account: #34191 (comment)
cc @bauwenst the getter and setter for self._loss_function should be of help!
I need to review but I think it does help to be able to set self._loss_function for sure. Now the questions is whether or not we want to explicitly do it in our of our models or not!

ArthurZucker

thanks sir! 🫡

src/transformers/models/gpt_neox/modeling_gpt_neox.py

src/transformers/models/xglm/modeling_xglm.py

* Save state * Make a failing test * Better test * mpt -> done, many more to go * Rm extranious * Bamba * Bert * big_bird * biogpt * bloom * codegen * ctrl * data2vec * dbrx * Through up to Dbrx * electra * ernie * falcon * Fuyu/persimmon * Include noop kwargs to base models * Rebase * Skip musigen * Refactor/skip mllama * Revert makefile * Rm file * Fix PT failing, need to modify rest of loss funcs to not resize * Propagate some * Continue * More * More options * Mostly fixed * Proved that it's the same * Bloom is good * Make ability to override loss func possible * Fixup * Clean * Fix xglm * Quality tests * Skip OCR2 * Make specific loss for xglm * Make order the same/line up 1:1 * xglm * Skip fx output loss bloom model * Didn't pass in pad_token_id * Fix quality

muellerzr requested a review from ArthurZucker January 24, 2025 13:08

muellerzr mentioned this pull request Jan 24, 2025

Handle num_items_in_batch in Mistral's forward #34576

Open

ArthurZucker reviewed Jan 24, 2025

View reviewed changes

ArthurZucker approved these changes Jan 30, 2025

View reviewed changes

test_modeling_names.txt Outdated

Copy link

Collaborator

ArthurZucker Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to delete!

muellerzr reacted with thumbs up emoji

ArthurZucker marked this pull request as ready for review January 30, 2025 14:20

muellerzr commented Feb 5, 2025

View reviewed changes

muellerzr force-pushed the muellerzr-fix-model-kwargs branch from cbddbc9 to 6b380f4 Compare February 5, 2025 18:49

ArthurZucker reviewed Feb 6, 2025

View reviewed changes

faaany mentioned this pull request Feb 6, 2025

[docs] fix outdated example code in trainer.md #36066

Merged

ArthurZucker approved these changes Feb 6, 2025

View reviewed changes

src/transformers/models/gpt_neox/modeling_gpt_neox.py Outdated Show resolved Hide resolved

src/transformers/models/xglm/modeling_xglm.py Outdated Show resolved Hide resolved

muellerzr added 17 commits February 6, 2025 10:45

Save state

d3c618e

Make a failing test

c489527

Better test

8a58190

mpt -> done, many more to go

4348e36

Rm extranious

3b3dfd2

Bamba

2bf5390

Bert

34f9060

big_bird

3960502

biogpt

a87ed15

bloom

2705ae6

codegen

33e718b

ctrl

e215848

data2vec

72459fa

dbrx

212ee51

Through up to Dbrx

8159793

electra

f5cf781

ernie

96e26f6

muellerzr added 24 commits February 6, 2025 10:45

Include noop kwargs to base models

d2d8f8e

Rebase

bf112ca

Skip musigen

308b91d

Refactor/skip mllama

ad5e487

Revert makefile

14c121d

Rm file

fcf896c

Fix PT failing, need to modify rest of loss funcs to not resize

24b59bf

Propagate some

6320ab4

Continue

44530b6

More

978dbbe

More options

ea4484e

Mostly fixed

12627ef

Proved that it's the same

dc42e65

Bloom is good

9f23ae7

Make ability to override loss func possible

12c00f6

Fixup

b6fb606

Clean

cfb3bcf

Fix xglm

f7eda3b

Quality tests

6d34419

Skip OCR2

c103851

Make specific loss for xglm

bde0bef

Make order the same/line up 1:1

2f951dd

xglm

5204b53

Skip fx output loss bloom model

038dc55

muellerzr force-pushed the muellerzr-fix-model-kwargs branch from d93121a to 038dc55 Compare February 6, 2025 15:45

muellerzr added 2 commits February 6, 2025 10:53

Didn't pass in pad_token_id

6033db8

Fix quality

ff06a1d

muellerzr merged commit 28f73bc into main Feb 6, 2025
26 checks passed

muellerzr deleted the muellerzr-fix-model-kwargs branch February 6, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix model kwargs #35875

Fix model kwargs #35875

muellerzr commented Jan 24, 2025

HuggingFaceDocBuilderDev commented Jan 24, 2025

ArthurZucker Jan 24, 2025

ArthurZucker Jan 24, 2025 •

edited

Loading

Bachstelze commented Jan 29, 2025

ArthurZucker left a comment

ArthurZucker Jan 30, 2025

muellerzr Feb 5, 2025

muellerzr commented Feb 5, 2025

ArthurZucker left a comment •

edited

Loading

ArthurZucker left a comment

		if hasattr(self, "_loss_function"):
		return self._loss_function

Fix model kwargs #35875

Fix model kwargs #35875

Conversation

muellerzr commented Jan 24, 2025

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jan 24, 2025

ArthurZucker Jan 24, 2025

Choose a reason for hiding this comment

ArthurZucker Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Bachstelze commented Jan 29, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jan 30, 2025

Choose a reason for hiding this comment

muellerzr Feb 5, 2025

Choose a reason for hiding this comment

muellerzr commented Feb 5, 2025

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jan 24, 2025 •

edited

Loading

ArthurZucker left a comment •

edited

Loading