Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The purpose of TEXT_ENCODING_OFFSET #104

Open
quochung-04 opened this issue Aug 16, 2024 · 2 comments
Open

The purpose of TEXT_ENCODING_OFFSET #104

quochung-04 opened this issue Aug 16, 2024 · 2 comments

Comments

@quochung-04
Copy link

quochung-04 commented Aug 16, 2024

Just for clearing I am seeing that in the semantic model, the encoded_text add with TEXT_ENCODING_OFFSET which have been defined as TEXT_ENCODING_OFFSET = 10_048.

Anyone understand why adding this offset. Will this cause the encoded_text to deviate from the original token? I can see there is a lm_head with the output of 10048. However this still make me confused.

Thanks in advance.

Copy link

dagshub bot commented Aug 16, 2024

@phantomwork
Copy link

phantomwork commented Aug 22, 2024

Hi, @quochung-04

You are really detail-oriented. :)
I think its because BarkCoarseModel takes as the input the results of BarkSemanticModel.
And the BarkSemanticModel's LM head is 10048.
But these all are fed into simultaneously. So it has offset 10048.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants