-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updates to grounded flow #53
Conversation
src/instructlab/sdg/default_flows.py
Outdated
"filter_value": "2.0", | ||
"operation": operator.eq, | ||
"convert_dtype": float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is somehow breaking the knowledge generation. The filter_relevancy
block is returning a dataset with 0 rows (probably because it filters out all the responses generated by the previous block). Is this expected? Same goes for filter_verify_question
block below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this could be because the file src/instructlab/sdg/configs/knowledge/simple_generate_qa.yaml
still contains
{question_1}
{response_1}
{question_2}
{response_2}
{question_3}
{response_3}
instead of
{icl_query_1}
{icl_response_1}
{icl_query_2}
{icl_response_2}
{icl_query_3}
{icl_response_3}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#57 addresses this
The commit log could be much improved. The logical changes I see, with the explanations I'd expect:
|
here are some explanations for the changes:
This is addressing an issue that we observed. In some of the cases where the model is being asked to provide a total score, it provides floating point sums like 1.5/2.5 and to accommodate those cases where we want to filter those values out it made sense to keep it float.
That is a step towards converting it to the final format (messages) required for training.
The template changes stem from model behavior on the templates and stress testing across various leaf nodes. The changes were mainly made to make sure the responses from the model align best with the expected behavior |
It would be great to get those explanations into the commit messages. See also "Information in commit messages" in https://www.berrange.com/tags/commit-message/
|
Thanks for the recommendation! We could start doing this here on, this PR needs to go in by today. We are on a very tight deadline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor changes, otherwise lgtm
done, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @oindrillac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last two commits just say "updated to string" but neither are updating something to be a string.
either way it looks like something that should be squashed in to your main commit
This pull request has merge conflicts that must be resolved before it can be |
@russellb squashed |
Thanks. Can you also add your explanations of the changes that you gave here to the commit messages as suggested? |
96c8f4f
to
3c302b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that e2e passed, though the commit messages still seem the same? I'm happy for it to merge once that content is there. Let me know if any help is needed with that. You don't need to wait for e2e to pass again if you only change the commit message. Just merge away.
This is a step towards converting it to the final format (messages) required for training. Signed-off-by: Oindrilla Chatterjee <[email protected]> Co-authored-by: Aakanksha Duggal <[email protected]> Co-authored-by: Shiv <[email protected]>
Changed the prompt templates and alignment with expected outputs. Conducted stress testing across various leaf nodes to ensure accuracy and relevance. Signed-off-by: Oindrilla Chatterjee <[email protected]> Co-authored-by: Aakanksha Duggal <[email protected]> Co-authored-by: Shiv <[email protected]>
lgtm! |
Oh for some reason it didn't get synced. Thanks for adding descriptive notes @aakankshaduggal |
Stress testing was done on the SDG pipeline