From db702e3463efd68c30559fafa7e6d17aef5b45bf Mon Sep 17 00:00:00 2001 From: Kelly Brown <kelbrown@redhat.com> Date: Wed, 18 Sep 2024 13:46:54 -0400 Subject: [PATCH] [Docs] Updates for SDG README Signed-off-by: Kelly Brown <kelbrown@redhat.com> --- README.md | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ed1a242b..5f3d6609 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# sdg +# Synthetic Data Generation (SDG) ![Lint](https://github.com/instructlab/sdg/actions/workflows/lint.yml/badge.svg?branch=main) ![Build](https://github.com/instructlab/sdg/actions/workflows/pypi.yaml/badge.svg?branch=main) @@ -6,3 +6,61 @@ ![License](https://img.shields.io/github/license/instructlab/sdg) Python library for Synthetic Data Generation + +## Introduction + +Synthetic Data Generation (SDG) is a process that creates an artificially generated dataset that mimics real data based on provided examples. SDG uses a YAML file containing question-and-answer pairs as input data. + +## Installing the SDG library + +Clone the library and navigate to the repo: +``` +git clone https://github.com/instructlab/sdg +cd sdg +``` + +Install the library: +``` +pip install . +``` + +## Using the library + +You can use the SDG library with the following items + +```bash + from instructlab.sdg.generate_data import generate_data + from instructlab.sdg.utils import GenerateException +``` + +<!--Not sure what more your thinking of adding here --> + +## Pipelines + +There are four pipelines that are used in SDG. Each pipeline requires specific hardware specifications. +<!--TODO: Add explanations of pipelines--> + +*Full* - + +*Simple* - + +*Schema* - + +<!--TODO: Add content here--> + +## Repository structure + +``` +|-- sdg/src/instructlab/pipelines/ (1) +|-- sdg/src/instructlab/configs/ (2) +|-- sdg/src/instructlab/utils/ (3) +|-- sdg/docs/ (4) +|-- sdg/scripts/ (5) +|-- sgd/tests/ (6) +``` +1. Contains the YAML code that configures the SDG pipelines +2. +3. +4. +5. +6. Contains all the CI tests for the SDG repository \ No newline at end of file