-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate_data: fix support for multiple leaf nodes #85
generate_data: fix support for multiple leaf nodes #85
Conversation
Makes sense, 👍 Do you understand why e2e was passing with this bug? |
Yes. It tested all taxonomy types, but one at a time. I ran it with two by accident locally and hit this. Test coverage improvement is needed. I will file an issue to fix this. |
Test coverage issue to fix e2e. Easy fix, should be able to do it tomorrow instructlab/instructlab#1599 |
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running generate_data() with a taxonomy with more than one leaf node, the code previously appended the resulting datasets as if they behaved like a Python list. They are a `datasets.Dataset` and do not support the `+` operator. Instead, keep a list of these datasets. Also update the code that writes these results to a file to handle the extra level of iteration now required. Signed-off-by: Russell Bryant <[email protected]>
29b703e
to
3834f60
Compare
When running generate_data() with a taxonomy with more than one leaf
node, the code previously appended the resulting datasets as if they
behaved like a Python list. They are a
datasets.Dataset
and do notsupport the
+
operator.Instead, keep a list of these datasets. Also update the code that
writes these results to a file to handle the extra level of iteration
now required.
Signed-off-by: Russell Bryant [email protected]