Synthetic Data Generation Pipeline

This repository contains a comprehensive pipeline for generating and evaluating synthetic data using various state-of-the-art models, including privacy-preserving approaches.

Report & Results

For more details on the methodology and results, see our report: Synthetic Data Generation overleaf report

Project Structure

Core Pipeline (`notebooks/pipeline/main.py`)

The main pipeline implementation that handles:

Data loading and preprocessing
Model training and evaluation
Synthetic data generation
Quality and privacy metrics evaluation
Results visualization and logging

Supported models:

CTGAN
TVAE
GReaT (with LoRA fine-tuning)
GaussianCopula
CopulaGAN
Privacy-preserving models:
- PATE-CTGAN
- DP-CTGAN

Exploration & Development

notebooks/GReaT_benchmark.ipynb: Initial experiments with GReaT model
notebooks/dp_models/dp_model_benchmark.ipynb: Development and testing of differential privacy models
eval.ipynb: Evaluation and visualization of model outputs

Installation

Create and activate virtual environment:

python -m venv env
source env/bin/activate

Install dependencies:

pip install -r requirements.txt

Usage

Run the pipeline with different configurations:

python -m notebooks.pipeline.main --experiment_name default_run

Features

Modular architecture supporting multiple synthetic data generation models
Comprehensive evaluation metrics for quality and privacy
Integration with Weights & Biases for experiment tracking
Automated visualization of results
Support for both standard and privacy-preserving models

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.ipynb		eval.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Data Generation Pipeline

Report & Results

Project Structure

Core Pipeline (`notebooks/pipeline/main.py`)

Exploration & Development

Installation

Usage

Features

About

Releases

Packages

Languages

License

MagMueller/Synthetic-Data-Generation

Folders and files

Latest commit

History

Repository files navigation

Synthetic Data Generation Pipeline

Report & Results

Project Structure

Core Pipeline (notebooks/pipeline/main.py)

Exploration & Development

Installation

Usage

Features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Core Pipeline (`notebooks/pipeline/main.py`)

Packages