Skip to content

Commit

Permalink
[Docs] Updates for SDG README
Browse files Browse the repository at this point in the history
Signed-off-by: Kelly Brown <[email protected]>
  • Loading branch information
kelbrown20 committed Sep 18, 2024
1 parent 432c2d1 commit db702e3
Showing 1 changed file with 59 additions and 1 deletion.
60 changes: 59 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,66 @@
# sdg
# Synthetic Data Generation (SDG)

![Lint](https://github.com/instructlab/sdg/actions/workflows/lint.yml/badge.svg?branch=main)
![Build](https://github.com/instructlab/sdg/actions/workflows/pypi.yaml/badge.svg?branch=main)
![Release](https://img.shields.io/github/v/release/instructlab/sdg)
![License](https://img.shields.io/github/license/instructlab/sdg)

Python library for Synthetic Data Generation

## Introduction

Synthetic Data Generation (SDG) is a process that creates an artificially generated dataset that mimics real data based on provided examples. SDG uses a YAML file containing question-and-answer pairs as input data.

## Installing the SDG library

Clone the library and navigate to the repo:
```

Check failure on line 17 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:17 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md031.md

Check failure on line 17 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should have a language specified

README.md:17 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md040.md
git clone https://github.com/instructlab/sdg
cd sdg
```

Install the library:
```

Check failure on line 23 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:23 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md031.md

Check failure on line 23 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should have a language specified

README.md:23 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md040.md
pip install .
```

## Using the library

You can use the SDG library with the following items

```bash
from instructlab.sdg.generate_data import generate_data
from instructlab.sdg.utils import GenerateException
```

<!--Not sure what more your thinking of adding here -->

## Pipelines

There are four pipelines that are used in SDG. Each pipeline requires specific hardware specifications.

Check failure on line 40 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Trailing spaces

README.md:40:104 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md009.md
<!--TODO: Add explanations of pipelines-->

*Full* -

*Simple* -

*Schema* -

<!--TODO: Add content here-->

## Repository structure

```

Check failure on line 53 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should have a language specified

README.md:53 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md040.md
|-- sdg/src/instructlab/pipelines/ (1)
|-- sdg/src/instructlab/configs/ (2)
|-- sdg/src/instructlab/utils/ (3)
|-- sdg/docs/ (4)
|-- sdg/scripts/ (5)
|-- sgd/tests/ (6)
```

Check failure on line 60 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:60 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md031.md
1. Contains the YAML code that configures the SDG pipelines

Check failure on line 61 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Lists should be surrounded by blank lines

README.md:61 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "1. Contains the YAML code that..."] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md032.md
2.

Check failure on line 62 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Trailing spaces

README.md:62:3 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md009.md
3.

Check failure on line 63 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Trailing spaces

README.md:63:3 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md009.md
4.
5.
6. Contains all the CI tests for the SDG repository

0 comments on commit db702e3

Please sign in to comment.