forked from instructlab/sdg
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
utils: Update taxonomy reading code to handle knowledge v3
This is part of instructlab#160 The changes here originated from aakankshaduggal@5baf6df There are two major changes here. - When parsing a `qna.yaml` file from a taxonomy tree, adjust for the new schema for knowledge. There is no attempt to maintain compatibility with prior versions of the schema (v1, v2). - Change how we translate the taxonomy data into the dataset sent into the pipeline as input. Instead of implementing a sliding window approach of 3 sample qna pairs at a time over all chunks of the document, we now create a row per seed_example (context and associated qna pairs) for each chunk of knowledge docs. Co-authored-by: abhi1092 <[email protected]> Co-authored-by: shiv <[email protected]> Co-authored-by: Aakanksha Duggal <[email protected]> Signed-off-by: Russell Bryant <[email protected]>
- Loading branch information
1 parent
33abe1e
commit 94a7a5e
Showing
1 changed file
with
47 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters