Skip to content

Commit

Permalink
Improve documentation of Instaööation/Configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
tiadams authored Feb 21, 2024
1 parent b226811 commit 3bce331
Showing 1 changed file with 23 additions and 11 deletions.
34 changes: 23 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,15 @@
INDEX is an intelligent data steward toolbox that leverages Large Language Model embeddings for automated Data-Harmonization.

## Table of Contents
- [Introduction](##ntroduction)
- [Installation & Usage](#installation)
- [Introduction](#introduction)
- [Installation](#installation)
- [Configuration](#configuration)

## Introduction

INDEX relies on vector embeddings calculated based on variable descriptions to generate mapping suggestions for any
dataset, enabling efficient and accurate data indexing and retrieval. Confirmed mappings are stored alongside their
vectorized representations in a knowledge base, facilitating rapid search and retrieval operations, ultimately enhancing
data management and analysis capabilities. New mappings may be added to the knowledge base in an iterative procedure,
allowing for improved mapping suggestions in subsequent harmonization tasks.

## Installation & Usage
INDEX uses vector embeddings from variable descriptions to suggest mappings for datasets, improving data indexing and retrieval. Confirmed mappings are stored with their vector representations in a knowledge base for fast search and retrieval, enhancing data management and analysis. New mappings can be added iteratively to improve suggestions for future harmonization tasks.

## Installation
Clone the repository:

```bash
Expand All @@ -41,12 +36,23 @@ uvicorn main:app --reload --port 5000

### Run the Backend via Docker

Download the latest docker build:
You can either build the docker container locally or downlaod the latest build from the index github package registry.


```bash
docker build . -t ghcr.io/scai-bio/backend:latest
```

```bash
docker pull ghcr.io/scai-bio/backend:latest
```

After build/download you will be able to start the container and access the IDNEX API per default on [localhost:8000](http://localhost:8000):

```bash
docker run -p 8000:80 ghcr.io/scai-bio/backend:latest
```

## Configuration

### Description Embeddings
Expand All @@ -55,4 +61,10 @@ You can configure INDEX to use either a local language model or call OPenAPIs em
is significantly faster, you will need to provide an API key that is linked to your OpenAI account.

Currently, the following local models are implemented:
* [MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)
* [MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)

The API will default to use a local embedding model. You can adjust the model loaded on start up in the configurations.

### Database

INDEX will be default store mappings in a file based db file in the [following directory](https://github.com/SCAI-BIO/index/tree/main/index/db).

0 comments on commit 3bce331

Please sign in to comment.