Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix model kwargs #35875

Merged
merged 45 commits into from
Feb 6, 2025
Merged

Fix model kwargs #35875

merged 45 commits into from
Feb 6, 2025

Conversation

muellerzr
Copy link
Contributor

What does this PR do?

Adds unused **kwargs to particular models so that num_items_in_batch can work as intended

Fixes #35838

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove

Comment on lines 572 to 573
loss = self.loss_function(
shift_logits.view(batch_size * seq_length, vocab_size),
shift_labels.view(batch_size * seq_length),
vocab_size=vocab_size,
**kwargs,
)
Copy link
Collaborator

@ArthurZucker ArthurZucker Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit weird, the refactor here should make you only have to pass inputs and the shifts will happen inside

@Bachstelze
Copy link

Is it normal that some checks were not successful?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to delete!

@ArthurZucker ArthurZucker marked this pull request as ready for review January 30, 2025 14:20
Comment on lines +5171 to +5203
if hasattr(self, "_loss_function"):
return self._loss_function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker this was needed to be added for a few models that don't need everything the loss func was up to. Case was xglm

@muellerzr muellerzr force-pushed the muellerzr-fix-model-kwargs branch from cbddbc9 to 6b380f4 Compare February 5, 2025 18:49
@muellerzr
Copy link
Contributor Author

Finally ready to go, sorry it took me a bit, lots of models to triple check 😓

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking this comment into account: #34191 (comment)
cc @bauwenst the getter and setter for self._loss_function should be of help!
I need to review but I think it does help to be able to set self._loss_function for sure. Now the questions is whether or not we want to explicitly do it in our of our models or not!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks sir! 🫡

src/transformers/models/gpt_neox/modeling_gpt_neox.py Outdated Show resolved Hide resolved
src/transformers/models/xglm/modeling_xglm.py Outdated Show resolved Hide resolved
@muellerzr muellerzr force-pushed the muellerzr-fix-model-kwargs branch from d93121a to 038dc55 Compare February 6, 2025 15:45
@muellerzr muellerzr merged commit 28f73bc into main Feb 6, 2025
26 checks passed
@muellerzr muellerzr deleted the muellerzr-fix-model-kwargs branch February 6, 2025 16:35
ArthurZucker pushed a commit that referenced this pull request Feb 6, 2025
* Save state

* Make a failing test

* Better test

* mpt -> done, many more to go

* Rm extranious

* Bamba

* Bert

* big_bird

* biogpt

* bloom

* codegen

* ctrl

* data2vec

* dbrx

* Through up to Dbrx

* electra

* ernie

* falcon

* Fuyu/persimmon

* Include noop kwargs to base models

* Rebase

* Skip musigen

* Refactor/skip mllama

* Revert makefile

* Rm file

* Fix PT failing, need to modify rest of loss funcs to not resize

* Propagate some

* Continue

* More

* More options

* Mostly fixed

* Proved that it's the same

* Bloom is good

* Make ability to override loss func possible

* Fixup

* Clean

* Fix xglm

* Quality tests

* Skip OCR2

* Make specific loss for xglm

* Make order the same/line up 1:1

* xglm

* Skip fx output loss bloom model

* Didn't pass in pad_token_id

* Fix quality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

forward() got an unexpected keyword argument 'num_items_in_batch'
4 participants