Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_data: fix support for multiple leaf nodes #85

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

russellb
Copy link
Member

@russellb russellb commented Jul 4, 2024

When running generate_data() with a taxonomy with more than one leaf
node, the code previously appended the resulting datasets as if they
behaved like a Python list. They are a datasets.Dataset and do not
support the + operator.

Instead, keep a list of these datasets. Also update the code that
writes these results to a file to handle the extra level of iteration
now required.

Signed-off-by: Russell Bryant [email protected]

@markmc
Copy link
Contributor

markmc commented Jul 4, 2024

Makes sense, 👍

Do you understand why e2e was passing with this bug?

@russellb
Copy link
Member Author

russellb commented Jul 4, 2024

Makes sense, 👍

Do you understand why e2e was passing with this bug?

Yes. It tested all taxonomy types, but one at a time. I ran it with two by accident locally and hit this. Test coverage improvement is needed. I will file an issue to fix this.

@russellb
Copy link
Member Author

russellb commented Jul 4, 2024

Test coverage issue to fix e2e. Easy fix, should be able to do it tomorrow instructlab/instructlab#1599

@russellb russellb added this to the 0.1.0 milestone Jul 8, 2024
@mergify mergify bot added the needs-rebase label Jul 8, 2024
Copy link
Contributor

mergify bot commented Jul 8, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @russellb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Member

@aakankshaduggal aakankshaduggal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @russellb
Changes look good but looks like merging #89 has caused merge conflicts, can you PTAL?

When running generate_data() with a taxonomy with more than one leaf
node, the code previously appended the resulting datasets as if they
behaved like a Python list. They are a `datasets.Dataset` and do not
support the `+` operator.

Instead, keep a list of these datasets. Also update the code that
writes these results to a file to handle the extra level of iteration
now required.

Signed-off-by: Russell Bryant <[email protected]>
@russellb russellb force-pushed the fix-multiple-output-datasets branch from 29b703e to 3834f60 Compare July 8, 2024 15:37
@russellb
Copy link
Member Author

russellb commented Jul 8, 2024

Thanks for the PR @russellb Changes look good but looks like merging #89 has caused merge conflicts, can you PTAL?

rebased on main

@mergify mergify bot removed the needs-rebase label Jul 8, 2024
@russellb russellb merged commit d6091ff into instructlab:main Jul 8, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants