SEED: Domain-Specific Data Curation With Large Language Models

This is the code repo for the paper "SEED: Domain-Specific Data Curation With Large Language Models".

SEED is an approach that leverages Large Language Models (LLMs) to automatically generate domain-specific data curation solutions. By describing a task, input data, and expected output, the SEED compiler produces an executable pipeline consisting of LLM-generated code, small models, and data access modules.

SEED: Legay Version

Link to the Legacy Version

SEED: Dev Version

Info

Current Version: v0.2.0

Compatibility: Tested on MacBook M1 Pro, MacOS 12.6.2, Python 3.9, Pytorch 1.13.1

Hardware Requirements: None

Installation

First install the prerequisites and the SEED package.

git clone [email protected]:Magolor/SEED.git
cd ./SEED/SeeD/
pip install -r requirements.txt
pip install -e .
cd ..

The most basic config is the auth key to OpenAI API. You can create a openai_api.json file to store it:

{
    "model": "gpt-4-turbo-preview",
    "api_key": "<YOUR_API_KEY>",
    "organization": "<YOUR_ORGANIZATION>"
}

Or you will be asked to manually input them in terminals during SEED setup.

Then run the installer to initialize configurations:

python post_install.py

Tutorials

Recommended: A full tutorial for understanding how SEED works in general: amazon_google_full_tutorial.
A short version of the same amazon google project: amazon_google_tutorial.
A code generation agent tutorial: restaurant_tutorial.
Others:
- pubmed_tutorial
- ...

TODO

SEED is currently under development, many features and optimizations coming!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SeeD		SeeD
legacy		legacy
optim_simul		optim_simul
tutorials		tutorials
.gitignore		.gitignore
README.md		README.md
installer.bash		installer.bash
paper.pdf		paper.pdf
post_install.py		post_install.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEED: Domain-Specific Data Curation With Large Language Models

SEED: Legay Version

SEED: Dev Version

Info

Installation

Tutorials

TODO

About

Releases

Packages

Languages

Magolor/SEED

Folders and files

Latest commit

History

Repository files navigation

SEED: Domain-Specific Data Curation With Large Language Models

SEED: Legay Version

SEED: Dev Version

Info

Installation

Tutorials

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages