-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoregressive mode and embedding calculation addition #62
Conversation
).hidden_states[-1] | ||
else: | ||
# First element of model_output contains all token embeddings | ||
token_embeddings = self.model(input_ids, attention_mask)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the first element is useful. Cz the attention is from the left to right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is the existing code, I moved taking the zeroth item from inside of the mean pooling function and put it here
else: | ||
# First element of model_output contains all token embeddings | ||
token_embeddings = self.model(input_ids, attention_mask)[0] | ||
embeddings = self.mean_pooling(token_embeddings, attention_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a pooling step since we select only a single embedding?
I guess two methods should be,
-
Selecting the EOS embedding as the representation, since it has seen all the previous.
-
Getting all the embeddings for every token and pool them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the shape of the input token embeddings is (1,<length of tokenized inputs>, <number of features>)
The output of hidden states flag is of shape [number of layers, 1, length of tokenized inputs
, number_of_features
]
so I think we're doing number 2 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I guess we can do the same trick for encoder-only models as well.
2. add is_autoregressive flag to eval portion 3. fix if to elif in save hook 4. remove default save statement 5. unwrap moel properly while saving
I guess we can remove the me embedding addition thing .. with a note we can say we got better results by taking the eos token as the representation .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
BGE models require CLS pooling and not the mean pooling https://huggingface.co/BAAI/bge-large-en#frequently-asked-questions |
Changes
autoregressive
flag to notifyAutoModelSequenceEmbeddings
andAutoModelForRagE2E
that an autoregressive model is being usedmean_pooling
is more applicable to both scenarios (clm
andmlm
)