Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix truncation length assertion #1382

Closed

Conversation

boyleconnor
Copy link
Contributor

Fixes #1326

I haven't seen any explanation from @Narsil why the line in question was changed, so I just went ahead and made this PR reversing it.

I've tested my version of this locally and can confirm that there are no longer two different errors at different places for slightly different invalid stride values:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained('bert-base-cased')
tokenizer.enable_truncation(max_length=10, stride=9)  # This line still fails (as it should)
print(tokenizer.encode("This piece of text is at least ten tokens long. In fact, it is likely many more than that."))

tokenizer = Tokenizer.from_pretrained('bert-base-cased')
tokenizer.enable_truncation(max_length=10, stride=8)  # Now this line (correctly) fails too
print(tokenizer.encode("This piece of text is at least ten tokens long. In fact, it is likely many more than that."))

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@github-actions github-actions bot added the Stale label Dec 6, 2023
@github-actions github-actions bot closed this Dec 12, 2023
@ArthurZucker
Copy link
Collaborator

Sorry @boyleconnor I'll see if this should be merged or not, I am not sure either 😉 This was not really breaking but the fix is breaking in a way

@boyleconnor
Copy link
Contributor Author

@ArthurZucker I'm not sure what you what you mean, would you mind elaborating?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stride assertion check no longer catches all invalid truncation parameters
3 participants