Skip to content

Commit

Permalink
add: Dockerfiles
Browse files Browse the repository at this point in the history
  • Loading branch information
tiadams committed Jul 9, 2024
1 parent 942a917 commit 33bda96
Show file tree
Hide file tree
Showing 14 changed files with 124 additions and 48 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/docker-package.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Python package
name: Docker package

on:
release:
Expand Down Expand Up @@ -28,3 +28,4 @@ jobs:
tags: |
ghcr.io/scai-bio/index/backend:latest
ghcr.io/scai-bio/index/backend:${{ steps.version.outputs.VERSION }}
- name: Build & push frontend
File renamed without changes.
65 changes: 26 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,55 @@
# INDEX – the Intelligent Data Steward Toolbox

![example workflow](https://github.com/SCAI-BIO/index/actions/workflows/tests.yml/badge.svg) ![GitHub Release](https://img.shields.io/github/v/release/SCAI-BIO/index)
![example workflow](https://github.com/SCAI-BIO/index/actions/workflows/tests.yml/badge.svg)
![GitHub Release](https://img.shields.io/github/v/release/SCAI-BIO/index)

INDEX is an intelligent data steward toolbox that leverages Large Language Model embeddings for automated Data-Harmonization.
INDEX is an intelligent data steward toolbox that leverages Large Language Model embeddings for automated Data-Harmonization.

## Table of Contents
- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Local Development Server](#local-development-server)
- [Starting the backend](#starting-the-backend)
- [Starting the frontend](#starting-the-frontend)
- [Docker](#docker)
- [Configuration](#configuration)

## Introduction

INDEX uses vector embeddings from variable descriptions to suggest mappings for datasets based on their semantic
similarity. Mappings are stored with their vector representations in a knowledge base, where they can be used for
subsequent harmonisation tasks, potentially improving the following suggestions with each iteration. Models for
the computation as well as databases for storage are meant to be configurable and extendable to adapt the tool for
specific use-cases.
INDEX uses vector embeddings from variable descriptions to suggest mappings for datasets based on their semantic similarity. Mappings are stored with their vector representations in a knowledge base, where they can be used for subsequent harmonisation tasks, potentially improving suggestions with each iteration. The tool is designed to be configurable and extendable, adapting for specific use-cases through customizable models and databases.

## Installation

```bash
uvicorn api.routes:app --reload --port 5000
```

### Run the Backend via Docker

The API can also be run via docker.

You can either build the docker container locally or download the latest build from the index GitHub package registry.
### Local Development Server

#### Starting the backend

```bash
docker build . -t ghcr.io/scai-bio/api/backend:latest
cd api
pip install -r requirements.txt
uvicorn routes:app --reload --port 5000
```

```bash
docker pull ghcr.io/scai-bio/api/backend:latest
```
Navigate to [localhost:5000](http://localhost:5000) to access the backend.

After build/download you will be able to start the container and access the INDEX API per default on [localhost:5000](http://localhost:8000):
#### Starting the frontend

```bash
docker run -p 8000:80 ghcr.io/api/scai-bio/backend:latest
cd client
pip install -r requirements.txt
uvicorn routes:app --reload --port 5000
```

## Configuration

### Description Embeddings

You can configure INDEX to use either a local language model or call OPenAPIs embedding API. While using the OpenAI API
is significantly faster, you will need to provide an API key that is linked to your OpenAI account.
Navigate to [localhost:4200](http://localhost:4200) to access the frontend.

Currently, the following local models are implemented:
* [Sentence Transformer (MPNet)](https://huggingface.co/docs/transformers/model_doc/mpnet)
### Docker

The API will default to use a local embedding model. You can adjust the model loaded on start up in the configurations.
You can start both frontend and API using docker-compose:

### Database
```bash
docker-compose -f docker-compose.local.yaml up
```

INDEX will by default store mappings in a file based db file in the [index/db](api/db) dir. For testing purposes
the initial SQLLite file based db contains a few of mappings to concepts in SNOMED CT. All available database adapter
implementations can be found in [index/repository](api/repository).
## Configuration

To exchange the DB implementation, load your custom DB adapter or pre-saved file-based DB file on application startup
[here](https://github.com/SCAI-BIO/index/blob/923601677fd62d50c3748b7f11666420e82df609/index/api/routes.py#L14).
The same can be done for any other embedding model.
_TODO: Add configuration instructions_
5 changes: 3 additions & 2 deletions api/routes.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import logging
import json
import uvicorn

from fastapi import FastAPI, HTTPException
from starlette.middleware.cors import CORSMiddleware
from starlette.responses import RedirectResponse, HTMLResponse
Expand All @@ -11,7 +11,7 @@
from datastew.visualisation import get_html_plot_for_current_database_state

logger = logging.getLogger("uvicorn.info")
repository = SQLLiteRepository(mode="memory")
repository = SQLLiteRepository(mode="disk", path="snomed.db")
embedding_model = MPNetAdapter()
db_plot_html = None

Expand Down Expand Up @@ -56,6 +56,7 @@
allow_headers=["*"],
)


@app.get("/", include_in_schema=False)
def swagger_redirect():
return RedirectResponse(url='/docs')
Expand Down
Binary file added api/snomed.db
Binary file not shown.
32 changes: 32 additions & 0 deletions client/Dockerfile.dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# FIRST STAGE: Build the Angular Application
FROM node:20.12.2 as build

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the working directory
COPY . .

# Update lock file
RUN npm install

# Install dependencies
RUN npm ci

# Install Angular CLI globally
RUN npm install -g @angular/cli

# Build the Angular application
RUN ng build --configuration=development

# SECOND STAGE: Serve the application using Nginx
FROM docker.io/library/nginx:1.26.0

# Copy the built application from the previous stage
COPY --from=build /app/dist/client/browser /usr/share/nginx/html

# Expose port 80
EXPOSE 80

# Start Nginx serve
CMD ["nginx", "-g", "daemon off;"]
32 changes: 32 additions & 0 deletions client/Dockerfile.prod
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# FIRST STAGE: Build the Angular Application
FROM node:20.12.2 as build

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the working directory
COPY . .

# Update lock file
RUN npm install

# Install dependencies
RUN npm ci

# Install Angular CLI globally
RUN npm install -g @angular/cli

# Build the Angular application
RUN ng build --configuration=production

# SECOND STAGE: Serve the application using Nginx
FROM docker.io/library/nginx:1.26.0

# Copy the built application from the previous stage
COPY --from=build /app/dist/client/browser /usr/share/nginx/html

# Expose port 80
EXPOSE 80

# Start Nginx serve
CMD ["nginx", "-g", "daemon off;"]
3 changes: 3 additions & 0 deletions client/angular.json
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,8 @@
}
}
}
},
"cli": {
"analytics": false
}
}
2 changes: 1 addition & 1 deletion client/src/environments/environment.development.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
export const environment: { openApiUrl: string } = {
openApiUrl: 'http://193.175.165.153:8000',
openApiUrl: 'http://localhost:5000',
};
6 changes: 3 additions & 3 deletions client/src/environments/environment.production.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
// export const environment: { openApiUrl: string } = {
// openApiUrl: 'https://index.bio.scai.fraunhofer.de',
// };
export const environment: { openApiUrl: string } = {
openApiUrl: 'https://index.bio.scai.fraunhofer.de',
};
3 changes: 1 addition & 2 deletions client/src/environments/environment.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
export const environment: { openApiUrl: string } = {
//openApiUrl: 'https://index.bio.scai.fraunhofer.de',
openApiUrl: 'http://193.175.165.153:8000',
openApiUrl: 'http://localhost:5000',
};
21 changes: 21 additions & 0 deletions docker-compose.local.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: "3.12"

services:

frontend:
image: index-client
build:
context: ./client
dockerfile: Dockerfile.dev
ports:
- "4200:80"
depends_on:
- backend

backend:
image: index-api
build:
context: ./api
dockerfile: Dockerfile
ports:
- "5000:80"
Empty file removed ui/.gitkeep
Empty file.
Empty file removed ui/src/app/app.component.scss
Empty file.

0 comments on commit 33bda96

Please sign in to comment.