GitHub - RyanQ96/VADIS: This repository contains the code and resources for the paper VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information-Seeking. You can access the paper at https://www.computer.org/csdl/journal/tg/5555/01/10677360/209oqMvDHtm.

This repository contains the code and resources for the paper "VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information-Seeking". You can access the paper here.

Features

Dynamic Document Embeddings: Adjust embeddings based on user queries using the Prompt-based Attention Model (PAM).
Relevance Visualization: Visualize documents in a way that reflects both relevance and similarity.
Interpretability: Understand model focus through attention visualization.
Extensibility: Support for multiple datasets and customizable training parameters.

Project Structure

.
├── arguments.py
├── data/
├── dataloader/
│   ├── dataset.py
├── model/
│   ├── pam.py
│   └── utils.py
├── notebooks/
│   ├── projection_example.ipynb
│   └── setup.py
├── relevance_preserving_map/
│   ├── circular_som.py
├── requirements.txt
├── run.sh
└── train.py

arguments.py: Defines training arguments using HfArgumentParser.
data/: Directory for placing additional training data.
dataloader/: Implements data loading for different datasets.
model/: Contains the implementation of the Prompt-based Attention Model (PAM).
notebooks/: Jupyter notebooks for projection examples.
relevance_preserving_map/: Implementation of the Relevance Preserving Map using Circular Self-Organizing Maps.
run.sh: Shell script to run the training with specified parameters.
train.py: Script for training the PAM.

Usage

Training the Prompt-based Attention Model (PAM)

The PAM generates dynamic document embeddings and relevance scores based on user queries.

Step 1: Prepare Your Data

Place any additional training data in the data/ directory.

Step 2: Configure Training Parameters

You can set training parameters via command-line arguments or by editing the run.sh script.

#!/bin/bash

python train.py \
  --report_name "training_run" \
  --num_epochs 3 \
  --learning_rate 5e-5 \
  --batch_size 4 \
  --max_length 512 \
  --model_name "bert-base-uncased" \
  --datasets "Your dataset"\
  --use_dual_loss True \
  --entropy_weight 0.01 \
  --load_pretrained_model False \
  --pretrained_model_path "" \
  --model_save_path "./models"

Relevance Preserving Map Projection

The Relevance Preserving Map uses a Circular Self-Organizing Map (SOM) to visualize data distribution, balancing relevance and semantic similarity between data. This projection method can be applied broadly to data distribution with relevance information.

Key Features

User-Driven Relevance Adjustment: Documents are placed based on dynamic relevance scores that adapt to the user’s query.
Circular Layout: Documents are visualized on a circular grid, balancing relevance and similarity.

Here's how to use the Circular SOM to generate a projection of your documents.

Sample Code for Running the Projection Here's how to use the Circular SOM to generate a projection of your documents.

import numpy as np
from relevance_preserving_map.circular_som import CircularSOM, get_grid_position_som, plot_som_results

# Sample data: Replace with your document embeddings and relevance scores
data = np.random.rand(100, 300)  # Example: 100 documents, 300 features each
relevance = np.random.rand(100)  # Relevance scores for each document
labels = np.arange(0, 100)       # Labels or identifiers for each document

# Initialize Circular SOM
som = CircularSOM(
    step=8,                       # Number of neurons in the first layer
    layer=21,                     # Number of layers in the circular grid
    input_len=data.shape[1],      # Input dimensionality
    sigma=1.5,                    # Initial neighborhood size
    learning_rate=0.7,            # Initial learning rate
    activation_distance='euclidean',
    topology='circular',
    neighborhood_function='gaussian',
    random_seed=10
)

# Train the SOM
som.train(
    data=data,
    relevance_score=relevance,
    num_iteration=1000,  # Adjust as needed
    w_s=0.2,             # Weight for similarity
    w_r=0.8,             # Weight for relevance
    verbose=True,
    report_error=True,
    use_sorted=True
)

# Get grid positions after training
ids_same_order = np.arange(data.shape[0])
data_grid_positions = get_grid_position_som(som, data, relevance, ids_same_order)

# Visualize the results
plot_som_results(som, data, labels, relevance, sort=True)

Notebook For a detailed example, refer to the Jupyter Notebook notebooks/projection_example.ipynb

TODO

Open-source the component-based frontend system

BibTeX

@article{qiu2024vadis,
  title={VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information-Seeking},
  author={Qiu, Rui and Tu, Yamei and Yen, Po-Yin and Shen, Han-Wei},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2024},
  publisher={IEEE},
  doi={10.1109/TVCG.2024.10677360},  % You can replace this with the actual DOI if available
  url={https://www.computer.org/csdl/journal/tg/5555/01/10677360/209oqMvDHtm}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Features

Project Structure

Usage

Training the Prompt-based Attention Model (PAM)

Step 1: Prepare Your Data

Step 2: Configure Training Parameters

Relevance Preserving Map Projection

TODO

BibTeX

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
assets		assets
dataloader		dataloader
model		model
notebooks		notebooks
relevance_preserving_map		relevance_preserving_map
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py

RyanQ96/VADIS

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Features

Project Structure

Usage

Training the Prompt-based Attention Model (PAM)

Step 1: Prepare Your Data

Step 2: Configure Training Parameters

Relevance Preserving Map Projection

TODO

BibTeX

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages