Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make template input more consistent with each other and taxonomy naming scheme #59

Closed
russellb opened this issue Jul 1, 2024 · 2 comments
Labels
enhancement New feature or request stale

Comments

@russellb
Copy link
Member

russellb commented Jul 1, 2024

The code dealing with differences between:

  • taxonomy structure and naming scheme
  • input structure and naming for different pipelines

is getting a lot more complicated than it seems necessary. #55 dealt with some short-term fixes. This issue is to track some cleanup work to make things more consistent and hopefully reduce the amount of data transformation necessary.

From #55

Longer term consideration:

None of this matches the source format for the dataset (taxonomy). Allowing people to specify a custom pipeline implies specifying their expected sample dataset format somehow.

Another idea instead ...

  1. Always assume a consistent dataset format.
  2. Add a new pipeline capability for dataset transformation -- rename fields if you want, squash the rows into groups of 3 seed questions/answers per row (for the knowledge case)

I think something like this is going to be necessary to allow more configurable custom pipelines, as we'll need a way for a custom pipeline to declare the dataset format it is expecting from a known starting point.

Copy link

This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Nov 24, 2024
jwm4 pushed a commit to jwm4/sdg that referenced this issue Dec 13, 2024
Added new approved Knowledge submission data sources, updated status for several.

Updated documentation with new process to take in requested knowledge sources to be open a PR against this devdoc.

Related to issue instructlab#59 which should be closed once this PR is reviewed and merged.

Co-Authored-by: JJ Asghar <[email protected]>
Signed-off-by: Leslie Hawthorn <[email protected]>
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants