-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
124 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,68 +1,55 @@ | ||
# INDEX – the Intelligent Data Steward Toolbox | ||
|
||
![example workflow](https://github.com/SCAI-BIO/index/actions/workflows/tests.yml/badge.svg) ![GitHub Release](https://img.shields.io/github/v/release/SCAI-BIO/index) | ||
![example workflow](https://github.com/SCAI-BIO/index/actions/workflows/tests.yml/badge.svg) | ||
![GitHub Release](https://img.shields.io/github/v/release/SCAI-BIO/index) | ||
|
||
INDEX is an intelligent data steward toolbox that leverages Large Language Model embeddings for automated Data-Harmonization. | ||
INDEX is an intelligent data steward toolbox that leverages Large Language Model embeddings for automated Data-Harmonization. | ||
|
||
## Table of Contents | ||
- [Introduction](#introduction) | ||
- [Installation](#installation) | ||
- [Usage](#usage) | ||
- [Local Development Server](#local-development-server) | ||
- [Starting the backend](#starting-the-backend) | ||
- [Starting the frontend](#starting-the-frontend) | ||
- [Docker](#docker) | ||
- [Configuration](#configuration) | ||
|
||
## Introduction | ||
|
||
INDEX uses vector embeddings from variable descriptions to suggest mappings for datasets based on their semantic | ||
similarity. Mappings are stored with their vector representations in a knowledge base, where they can be used for | ||
subsequent harmonisation tasks, potentially improving the following suggestions with each iteration. Models for | ||
the computation as well as databases for storage are meant to be configurable and extendable to adapt the tool for | ||
specific use-cases. | ||
INDEX uses vector embeddings from variable descriptions to suggest mappings for datasets based on their semantic similarity. Mappings are stored with their vector representations in a knowledge base, where they can be used for subsequent harmonisation tasks, potentially improving suggestions with each iteration. The tool is designed to be configurable and extendable, adapting for specific use-cases through customizable models and databases. | ||
|
||
## Installation | ||
|
||
```bash | ||
uvicorn api.routes:app --reload --port 5000 | ||
``` | ||
|
||
### Run the Backend via Docker | ||
|
||
The API can also be run via docker. | ||
|
||
You can either build the docker container locally or download the latest build from the index GitHub package registry. | ||
### Local Development Server | ||
|
||
#### Starting the backend | ||
|
||
```bash | ||
docker build . -t ghcr.io/scai-bio/api/backend:latest | ||
cd api | ||
pip install -r requirements.txt | ||
uvicorn routes:app --reload --port 5000 | ||
``` | ||
|
||
```bash | ||
docker pull ghcr.io/scai-bio/api/backend:latest | ||
``` | ||
Navigate to [localhost:5000](http://localhost:5000) to access the backend. | ||
|
||
After build/download you will be able to start the container and access the INDEX API per default on [localhost:5000](http://localhost:8000): | ||
#### Starting the frontend | ||
|
||
```bash | ||
docker run -p 8000:80 ghcr.io/api/scai-bio/backend:latest | ||
cd client | ||
pip install -r requirements.txt | ||
uvicorn routes:app --reload --port 5000 | ||
``` | ||
|
||
## Configuration | ||
|
||
### Description Embeddings | ||
|
||
You can configure INDEX to use either a local language model or call OPenAPIs embedding API. While using the OpenAI API | ||
is significantly faster, you will need to provide an API key that is linked to your OpenAI account. | ||
Navigate to [localhost:4200](http://localhost:4200) to access the frontend. | ||
|
||
Currently, the following local models are implemented: | ||
* [Sentence Transformer (MPNet)](https://huggingface.co/docs/transformers/model_doc/mpnet) | ||
### Docker | ||
|
||
The API will default to use a local embedding model. You can adjust the model loaded on start up in the configurations. | ||
You can start both frontend and API using docker-compose: | ||
|
||
### Database | ||
```bash | ||
docker-compose -f docker-compose.local.yaml up | ||
``` | ||
|
||
INDEX will by default store mappings in a file based db file in the [index/db](api/db) dir. For testing purposes | ||
the initial SQLLite file based db contains a few of mappings to concepts in SNOMED CT. All available database adapter | ||
implementations can be found in [index/repository](api/repository). | ||
## Configuration | ||
|
||
To exchange the DB implementation, load your custom DB adapter or pre-saved file-based DB file on application startup | ||
[here](https://github.com/SCAI-BIO/index/blob/923601677fd62d50c3748b7f11666420e82df609/index/api/routes.py#L14). | ||
The same can be done for any other embedding model. | ||
_TODO: Add configuration instructions_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# FIRST STAGE: Build the Angular Application | ||
FROM node:20.12.2 as build | ||
|
||
# Set the working directory | ||
WORKDIR /app | ||
|
||
# Copy the current directory contents into the working directory | ||
COPY . . | ||
|
||
# Update lock file | ||
RUN npm install | ||
|
||
# Install dependencies | ||
RUN npm ci | ||
|
||
# Install Angular CLI globally | ||
RUN npm install -g @angular/cli | ||
|
||
# Build the Angular application | ||
RUN ng build --configuration=development | ||
|
||
# SECOND STAGE: Serve the application using Nginx | ||
FROM docker.io/library/nginx:1.26.0 | ||
|
||
# Copy the built application from the previous stage | ||
COPY --from=build /app/dist/client/browser /usr/share/nginx/html | ||
|
||
# Expose port 80 | ||
EXPOSE 80 | ||
|
||
# Start Nginx serve | ||
CMD ["nginx", "-g", "daemon off;"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# FIRST STAGE: Build the Angular Application | ||
FROM node:20.12.2 as build | ||
|
||
# Set the working directory | ||
WORKDIR /app | ||
|
||
# Copy the current directory contents into the working directory | ||
COPY . . | ||
|
||
# Update lock file | ||
RUN npm install | ||
|
||
# Install dependencies | ||
RUN npm ci | ||
|
||
# Install Angular CLI globally | ||
RUN npm install -g @angular/cli | ||
|
||
# Build the Angular application | ||
RUN ng build --configuration=production | ||
|
||
# SECOND STAGE: Serve the application using Nginx | ||
FROM docker.io/library/nginx:1.26.0 | ||
|
||
# Copy the built application from the previous stage | ||
COPY --from=build /app/dist/client/browser /usr/share/nginx/html | ||
|
||
# Expose port 80 | ||
EXPOSE 80 | ||
|
||
# Start Nginx serve | ||
CMD ["nginx", "-g", "daemon off;"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,5 +93,8 @@ | |
} | ||
} | ||
} | ||
}, | ||
"cli": { | ||
"analytics": false | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
export const environment: { openApiUrl: string } = { | ||
openApiUrl: 'http://193.175.165.153:8000', | ||
openApiUrl: 'http://localhost:5000', | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
// export const environment: { openApiUrl: string } = { | ||
// openApiUrl: 'https://index.bio.scai.fraunhofer.de', | ||
// }; | ||
export const environment: { openApiUrl: string } = { | ||
openApiUrl: 'https://index.bio.scai.fraunhofer.de', | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
export const environment: { openApiUrl: string } = { | ||
//openApiUrl: 'https://index.bio.scai.fraunhofer.de', | ||
openApiUrl: 'http://193.175.165.153:8000', | ||
openApiUrl: 'http://localhost:5000', | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
version: "3.12" | ||
|
||
services: | ||
|
||
frontend: | ||
image: index-client | ||
build: | ||
context: ./client | ||
dockerfile: Dockerfile.dev | ||
ports: | ||
- "4200:80" | ||
depends_on: | ||
- backend | ||
|
||
backend: | ||
image: index-api | ||
build: | ||
context: ./api | ||
dockerfile: Dockerfile | ||
ports: | ||
- "5000:80" |
Empty file.
Empty file.