The purpose of TEXT_ENCODING_OFFSET #104

quochung-04 · 2024-08-16T18:22:51Z

Just for clearing I am seeing that in the semantic model, the encoded_text add with TEXT_ENCODING_OFFSET which have been defined as TEXT_ENCODING_OFFSET = 10_048.

Anyone understand why adding this offset. Will this cause the encoded_text to deviate from the original token? I can see there is a lm_head with the output of 10048. However this still make me confused.

Thanks in advance.

dagshub · 2024-08-16T18:22:53Z

Join the discussion on DagsHub!

phantomwork · 2024-08-22T21:31:23Z

Hi, @quochung-04

You are really detail-oriented. :)
I think its because BarkCoarseModel takes as the input the results of BarkSemanticModel.
And the BarkSemanticModel's LM head is 10048.
But these all are fed into simultaneously. So it has offset 10048.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The purpose of TEXT_ENCODING_OFFSET #104

The purpose of TEXT_ENCODING_OFFSET #104

quochung-04 commented Aug 16, 2024 •

edited

Loading

dagshub bot commented Aug 16, 2024

phantomwork commented Aug 22, 2024 •

edited

Loading

The purpose of TEXT_ENCODING_OFFSET #104

The purpose of TEXT_ENCODING_OFFSET #104

Comments

quochung-04 commented Aug 16, 2024 • edited Loading

dagshub bot commented Aug 16, 2024

phantomwork commented Aug 22, 2024 • edited Loading

quochung-04 commented Aug 16, 2024 •

edited

Loading

phantomwork commented Aug 22, 2024 •

edited

Loading