From 1ed64e0ca8958575793d756f84a6da59bd98429c Mon Sep 17 00:00:00 2001 From: "Aivin V. Solatorio" Date: Mon, 6 Feb 2023 21:18:12 -0500 Subject: [PATCH] Update README and bump version Signed-off-by: Aivin V. Solatorio --- README.md | 8 ++------ src/realtabformer/VERSION | 2 +- 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index e9698fa..e1bb613 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,19 @@ +Open In Colab + # RealTabFormer The REaLTabFormer (Realistic Relational and Tabular Data using Transformers) offers a unified framework for synthesizing tabular data of different types. A sequence-to-sequence (Seq2Seq) model is used for generating synthetic relational datasets. The REaLTabFormer model for a non-relational tabular data uses GPT-2, and can be used out-of-the-box to model any tabular data with independent observations. -

-

-

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers
Paper on ArXiv

- -

- Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular datasets where observations are independent, but few have the ability to produce relational datasets. Modeling relational data is challenging as it requires modeling both a "parent" table and its relationships across tables. We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq) model. We implement target masking to prevent data copying and propose the $Q_\delta$ statistic and statistical bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures the relational structure better than a baseline model. REaLTabFormer also achieves state-of-the-art results on prediction tasks, "out-of-the-box", for large non-relational datasets without needing fine-tuning. diff --git a/src/realtabformer/VERSION b/src/realtabformer/VERSION index 6e8bf73..17e51c3 100644 --- a/src/realtabformer/VERSION +++ b/src/realtabformer/VERSION @@ -1 +1 @@ -0.1.0 +0.1.1