-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
Signed-off-by: Kelly Brown <[email protected]>
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,66 @@ | ||
# sdg | ||
# Synthetic Data Generation (SDG) | ||
|
||
![Lint](https://github.com/instructlab/sdg/actions/workflows/lint.yml/badge.svg?branch=main) | ||
![Build](https://github.com/instructlab/sdg/actions/workflows/pypi.yaml/badge.svg?branch=main) | ||
![Release](https://img.shields.io/github/v/release/instructlab/sdg) | ||
![License](https://img.shields.io/github/license/instructlab/sdg) | ||
|
||
Python library for Synthetic Data Generation | ||
|
||
## Introduction | ||
|
||
Synthetic Data Generation (SDG) is a process that creates an artificially generated dataset that mimics real data based on provided examples. SDG uses a YAML file containing question-and-answer pairs as input data. | ||
|
||
## Installing the SDG library | ||
|
||
Clone the library and navigate to the repo: | ||
``` | ||
Check failure on line 17 in README.md GitHub Actions / markdown-lintFenced code blocks should be surrounded by blank lines
Check failure on line 17 in README.md GitHub Actions / markdown-lintFenced code blocks should have a language specified
|
||
git clone https://github.com/instructlab/sdg | ||
cd sdg | ||
``` | ||
|
||
Install the library: | ||
``` | ||
Check failure on line 23 in README.md GitHub Actions / markdown-lintFenced code blocks should be surrounded by blank lines
Check failure on line 23 in README.md GitHub Actions / markdown-lintFenced code blocks should have a language specified
|
||
pip install . | ||
``` | ||
|
||
## Using the library | ||
|
||
You can use the SDG library with the following items | ||
|
||
```bash | ||
from instructlab.sdg.generate_data import generate_data | ||
from instructlab.sdg.utils import GenerateException | ||
``` | ||
|
||
<!--Not sure what more your thinking of adding here --> | ||
|
||
## Pipelines | ||
|
||
There are four pipelines that are used in SDG. Each pipeline requires specific hardware specifications. | ||
Check failure on line 40 in README.md GitHub Actions / markdown-lintTrailing spaces
|
||
<!--TODO: Add explanations of pipelines--> | ||
|
||
*Full* - | ||
|
||
*Simple* - | ||
|
||
*Schema* - | ||
|
||
<!--TODO: Add content here--> | ||
|
||
## Repository structure | ||
|
||
``` | ||
Check failure on line 53 in README.md GitHub Actions / markdown-lintFenced code blocks should have a language specified
|
||
|-- sdg/src/instructlab/pipelines/ (1) | ||
|-- sdg/src/instructlab/configs/ (2) | ||
|-- sdg/src/instructlab/utils/ (3) | ||
|-- sdg/docs/ (4) | ||
|-- sdg/scripts/ (5) | ||
|-- sgd/tests/ (6) | ||
``` | ||
Check failure on line 60 in README.md GitHub Actions / markdown-lintFenced code blocks should be surrounded by blank lines
|
||
1. Contains the YAML code that configures the SDG pipelines | ||
Check failure on line 61 in README.md GitHub Actions / markdown-lintLists should be surrounded by blank lines
|
||
2. | ||
Check failure on line 62 in README.md GitHub Actions / markdown-lintTrailing spaces
|
||
3. | ||
Check failure on line 63 in README.md GitHub Actions / markdown-lintTrailing spaces
|
||
4. | ||
5. | ||
6. Contains all the CI tests for the SDG repository |