Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates to grounded flow #53

Merged
merged 2 commits into from
Jul 2, 2024
Merged

updates to grounded flow #53

merged 2 commits into from
Jul 2, 2024

Conversation

oindrillac
Copy link
Contributor

Stress testing was done on the SDG pipeline

  • added grounded test script that can be used to test
  • updates to templates to get full pipeline working

Comment on lines 229 to 231
"filter_value": "2.0",
"operation": operator.eq,
"convert_dtype": float,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is somehow breaking the knowledge generation. The filter_relevancy block is returning a dataset with 0 rows (probably because it filters out all the responses generated by the previous block). Is this expected? Same goes for filter_verify_question block below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this could be because the file src/instructlab/sdg/configs/knowledge/simple_generate_qa.yaml still contains

  {question_1}
  {response_1}

  {question_2}
  {response_2}

  {question_3}
  {response_3}
  

instead of

  {icl_query_1}
  {icl_response_1}
 
  {icl_query_2}
  {icl_response_2}

  {icl_query_3}
  {icl_response_3}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#57 addresses this

@markmc
Copy link
Contributor

markmc commented Jul 1, 2024

The commit log could be much improved. The logical changes I see, with the explanations I'd expect:

  1. Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn
  2. Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior
  3. template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?

@oindrillac
Copy link
Contributor Author

here are some explanations for the changes:

  1. Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn

This is addressing an issue that we observed. In some of the cases where the model is being asked to provide a total score, it provides floating point sums like 1.5/2.5 and to accommodate those cases where we want to filter those values out it made sense to keep it float.

  1. Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior

That is a step towards converting it to the final format (messages) required for training.

  1. template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?

The template changes stem from model behavior on the templates and stress testing across various leaf nodes. The changes were mainly made to make sure the responses from the model align best with the expected behavior

@markmc
Copy link
Contributor

markmc commented Jul 1, 2024

here are some explanations for the changes:

It would be great to get those explanations into the commit messages. See also "Information in commit messages" in https://www.berrange.com/tags/commit-message/

The commit message must contain all the information required to fully understand & review the patch for correctness. Less is not more. More is more.

@shivchander
Copy link
Member

here are some explanations for the changes:

It would be great to get those explanations into the commit messages. See also "Information in commit messages" in https://www.berrange.com/tags/commit-message/

The commit message must contain all the information required to fully understand & review the patch for correctness. Less is not more. More is more.

Thanks for the recommendation! We could start doing this here on, this PR needs to go in by today. We are on a very tight deadline

Copy link
Member

@shivchander shivchander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor changes, otherwise lgtm

src/instructlab/sdg/default_flows.py Outdated Show resolved Hide resolved
src/instructlab/sdg/default_flows.py Outdated Show resolved Hide resolved
@oindrillac
Copy link
Contributor Author

a few minor changes, otherwise lgtm

done, thank you!

Copy link
Member

@aakankshaduggal aakankshaduggal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @oindrillac

@aakankshaduggal aakankshaduggal requested a review from npalaska July 1, 2024 18:00
Copy link
Member

@shivchander shivchander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@aakankshaduggal aakankshaduggal requested a review from russellb July 1, 2024 19:29
Copy link
Contributor

@npalaska npalaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last two commits just say "updated to string" but neither are updating something to be a string.

either way it looks like something that should be squashed in to your main commit

@mergify mergify bot added CI/CD Affects CI/CD configuration testing Relates to testing needs-rebase labels Jul 2, 2024
Copy link
Contributor

mergify bot commented Jul 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @oindrillac please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@oindrillac
Copy link
Contributor Author

@russellb squashed

@russellb
Copy link
Member

russellb commented Jul 2, 2024

@russellb squashed

Thanks. Can you also add your explanations of the changes that you gave here to the commit messages as suggested?

@oindrillac oindrillac force-pushed the grounded branch 3 times, most recently from 96c8f4f to 3c302b0 Compare July 2, 2024 18:37
@oindrillac oindrillac requested a review from russellb July 2, 2024 18:37
@oindrillac
Copy link
Contributor Author

oindrillac commented Jul 2, 2024

@russellb @markmc updated the commit messages as per your suggestion, ptal thanks

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that e2e passed, though the commit messages still seem the same? I'm happy for it to merge once that content is there. Let me know if any help is needed with that. You don't need to wait for e2e to pass again if you only change the commit message. Just merge away.

oindrillac and others added 2 commits July 2, 2024 16:34
This is a step towards converting it to the final format (messages) required for training.

Signed-off-by: Oindrilla Chatterjee <[email protected]>
Co-authored-by: Aakanksha Duggal <[email protected]>
Co-authored-by: Shiv <[email protected]>
Changed the prompt templates and alignment with expected outputs. Conducted stress testing across various leaf nodes to ensure accuracy and relevance.

Signed-off-by: Oindrilla Chatterjee <[email protected]>
Co-authored-by: Aakanksha Duggal <[email protected]>
Co-authored-by: Shiv <[email protected]>
@russellb
Copy link
Member

russellb commented Jul 2, 2024

lgtm!

@russellb russellb merged commit 2788384 into instructlab:main Jul 2, 2024
10 checks passed
@oindrillac
Copy link
Contributor Author

Oh for some reason it didn't get synced. Thanks for adding descriptive notes @aakankshaduggal

@russellb russellb added this to the 0.1.0 milestone Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants