Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds audio querying to MultimodalQ&A Example #1225

Merged
merged 20 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3d22cf5
Build, compose, and doc updates for audio query (#12)
okhleif-IL Dec 2, 2024
22a5b15
MMQnA doc update correcting ASR and whisper image names (#24)
dmsuehir Dec 3, 2024
84ec278
Integrate audio query into UI (#22)
mhbuehler Dec 4, 2024
c9fe70e
Temporarily redirect clones for tests
mhbuehler Dec 4, 2024
7f7236d
Merge pull request #25 from mhbuehler/melanie/redirect_clones_for_tests
mhbuehler Dec 4, 2024
56db11a
Merge branch 'main' into mmqna-audio-query
mhbuehler Dec 4, 2024
f67146f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2024
fdf5a08
Merge branch 'main' into mmqna-audio-query
mhbuehler Dec 5, 2024
aa5a5ac
Add services to tests and correct small text error (#26)
mhbuehler Dec 5, 2024
30e33a6
Fixed build.yaml inconsistency
mhbuehler Dec 6, 2024
9ba341b
Merge branch 'main' into mmqna-audio-query
ashahba Dec 6, 2024
54c82ac
Merge pull request #27 from mhbuehler/melanie/whisper_image_name
mhbuehler Dec 7, 2024
bcabb36
Merge branch 'main' into mmqna-audio-query
mhbuehler Dec 7, 2024
02b87b0
Update repo clones for E2E tests
mhbuehler Dec 9, 2024
c421e68
Merge pull request #30 from mhbuehler/melanie/revert_clones
mhbuehler Dec 9, 2024
f839c43
Moved Audio Query Gateway changes to multimodalqna.py (#29)
okhleif-IL Dec 10, 2024
674c975
Merge branch 'main' into mmqna-audio-query
mhbuehler Dec 10, 2024
31da576
Removed proxies arg from translation request (#31)
okhleif-IL Dec 10, 2024
ba1fd52
Merge branch 'main' into mmqna-audio-query
mhbuehler Dec 10, 2024
55585ab
Merge branch 'main' into mmqna-audio-query
ashahba Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 46 additions & 11 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ export https_proxy=${your_http_proxy}
export EMBEDDER_PORT=6006
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT/v1/encode"
export MM_EMBEDDING_PORT_MICROSERVICE=6000
export ASR_ENDPOINT=http://$host_ip:7066
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export REDIS_URL="redis://${host_ip}:6379"
export REDIS_HOST=${host_ip}
export INDEX_NAME="mm-rag-redis"
Expand Down Expand Up @@ -144,7 +147,21 @@ docker build --no-cache -t opea/lvm-llava-svc:latest --build-arg https_proxy=$ht
docker build --no-cache -t opea/dataprep-multimodal-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimodal/redis/langchain/Dockerfile .
```

### 5. Build MegaService Docker Image
### 5. Build asr images

Build whisper server image

```bash
docker build --no-cache -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
```

Build asr image

```bash
docker build --no-cache -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
```

### 6. Build MegaService Docker Image

To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the [multimodalqna.py](../../../../multimodalqna.py) Python script. Build MegaService Docker image via below command:

Expand All @@ -155,7 +172,7 @@ docker build --no-cache -t opea/multimodalqna:latest --build-arg https_proxy=$ht
cd ../..
```

### 6. Build UI Docker Image
### 7. Build UI Docker Image

Build frontend Docker image via below command:

Expand All @@ -165,16 +182,19 @@ docker build --no-cache -t opea/multimodalqna-ui:latest --build-arg https_proxy=
cd ../../../
```

Then run the command `docker images`, you will have the following 8 Docker Images:
Then run the command `docker images`, you will have the following 11 Docker Images:

1. `opea/dataprep-multimodal-redis:latest`
2. `opea/lvm-llava-svc:latest`
3. `opea/lvm-llava:latest`
4. `opea/retriever-multimodal-redis:latest`
5. `opea/embedding-multimodal:latest`
6. `opea/embedding-multimodal-bridgetower:latest`
7. `opea/multimodalqna:latest`
8. `opea/multimodalqna-ui:latest`
5. `opea/whisper:latest`
6. `opea/asr:latest`
7. `opea/redis-vector-db`
8. `opea/embedding-multimodal:latest`
9. `opea/embedding-multimodal-bridgetower:latest`
10. `opea/multimodalqna:latest`
11. `opea/multimodalqna-ui:latest`

## 🚀 Start Microservices

Expand Down Expand Up @@ -240,7 +260,16 @@ curl http://${host_ip}:7000/v1/multimodal_retrieval \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
```

4. lvm-llava
4. asr

```bash
curl ${ASR_SERVICE_ENDPOINT} \
-X POST \
-H "Content-Type: application/json" \
-d '{"byte_str" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
```

5. lvm-llava

```bash
curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
Expand All @@ -249,7 +278,7 @@ curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
-d '{"prompt":"Describe the image please.", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}'
```

5. lvm-llava-svc
6. lvm-llava-svc

```bash
curl http://${host_ip}:9399/v1/lvm \
Expand All @@ -274,7 +303,7 @@ curl http://${host_ip}:9399/v1/lvm \
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
```

6. dataprep-multimodal-redis
7. dataprep-multimodal-redis

Download a sample video, image, and audio file and create a caption

Expand Down Expand Up @@ -348,7 +377,7 @@ curl -X POST \
${DATAPREP_DELETE_FILE_ENDPOINT}
```

7. MegaService
8. MegaService

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
Expand All @@ -357,6 +386,12 @@ curl http://${host_ip}:8888/v1/multimodalqna \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
```

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
-H "Content-Type: application/json" \
Expand Down
24 changes: 24 additions & 0 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,27 @@
# SPDX-License-Identifier: Apache-2.0

services:
whisper-service:
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
container_name: whisper-service
ports:
- "7066:7066"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "${ASR_SERVICE_PORT}:9099"
ipc: host
environment:
ASR_ENDPOINT: ${ASR_ENDPOINT}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
Expand Down Expand Up @@ -102,6 +123,7 @@ services:
- embedding-multimodal
- retriever-multimodal-redis
- lvm-llava-svc
- asr
ports:
- "8888:8888"
environment:
Expand All @@ -113,6 +135,8 @@ services:
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
ipc: host
restart: always
multimodalqna-ui:
Expand Down
3 changes: 3 additions & 0 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ export https_proxy=${your_http_proxy}
export EMBEDDER_PORT=6006
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT/v1/encode"
export MM_EMBEDDING_PORT_MICROSERVICE=6000
export ASR_ENDPOINT=http://$host_ip:7066
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export REDIS_URL="redis://${host_ip}:6379"
export REDIS_HOST=${host_ip}
export INDEX_NAME="mm-rag-redis"
Expand Down
49 changes: 39 additions & 10 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ export LVM_MODEL_ID="llava-hf/llava-v1.6-vicuna-13b-hf"
export WHISPER_MODEL="base"
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export ASR_ENDPOINT=http://$host_ip:7066
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
Expand Down Expand Up @@ -95,7 +98,21 @@ docker build --no-cache -t opea/lvm-tgi:latest --build-arg https_proxy=$https_pr
docker build --no-cache -t opea/dataprep-multimodal-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimodal/redis/langchain/Dockerfile .
```

### 5. Build MegaService Docker Image
### 5. Build asr images

Build whisper server image

```bash
docker build --no-cache -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
```

Build asr image

```bash
docker build --no-cache -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
```

### 6. Build MegaService Docker Image

To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the [multimodalqna.py](../../../../multimodalqna.py) Python script. Build MegaService Docker image via below command:

Expand All @@ -114,16 +131,19 @@ cd GenAIExamples/MultimodalQnA/ui/
docker build --no-cache -t opea/multimodalqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```

Then run the command `docker images`, you will have the following 8 Docker Images:
Then run the command `docker images`, you will have the following 11 Docker Images:

1. `opea/dataprep-multimodal-redis:latest`
2. `opea/lvm-tgi:latest`
3. `ghcr.io/huggingface/tgi-gaudi:2.0.6`
4. `opea/retriever-multimodal-redis:latest`
5. `opea/embedding-multimodal:latest`
6. `opea/embedding-multimodal-bridgetower:latest`
7. `opea/multimodalqna:latest`
8. `opea/multimodalqna-ui:latest`
5. `opea/whisper:latest`
6. `opea/asr:latest`
7. `opea/redis-vector-db`
8. `opea/embedding-multimodal:latest`
9. `opea/embedding-multimodal-bridgetower:latest`
10. `opea/multimodalqna:latest`
11. `opea/multimodalqna-ui:latest`

## 🚀 Start Microservices

Expand Down Expand Up @@ -189,7 +209,16 @@ curl http://${host_ip}:7000/v1/multimodal_retrieval \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
```

4. TGI LLaVA Gaudi Server
4. asr

```bash
curl ${ASR_SERVICE_ENDPOINT} \
-X POST \
-H "Content-Type: application/json" \
-d '{"byte_str" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
```

5. TGI LLaVA Gaudi Server

```bash
curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
Expand All @@ -198,7 +227,7 @@ curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
-H 'Content-Type: application/json'
```

5. lvm-tgi
6. lvm-tgi

```bash
curl http://${host_ip}:9399/v1/lvm \
Expand All @@ -223,7 +252,7 @@ curl http://${host_ip}:9399/v1/lvm \
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
```

6. Multimodal Dataprep Microservice
7. Multimodal Dataprep Microservice

Download a sample video, image, and audio file and create a caption

Expand Down Expand Up @@ -297,7 +326,7 @@ curl -X POST \
${DATAPREP_DELETE_FILE_ENDPOINT}
```

7. MegaService
8. MegaService

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
Expand Down
24 changes: 24 additions & 0 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,27 @@ services:
ports:
- "6379:6379"
- "8001:8001"
whisper-service:
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
container_name: whisper-service
ports:
- "7066:7066"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "${ASR_SERVICE_PORT}:9099"
ipc: host
environment:
ASR_ENDPOINT: ${ASR_ENDPOINT}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
dataprep-multimodal-redis:
image: ${REGISTRY:-opea}/dataprep-multimodal-redis:${TAG:-latest}
container_name: dataprep-multimodal-redis
Expand Down Expand Up @@ -119,6 +140,7 @@ services:
- embedding-multimodal
- retriever-multimodal-redis
- lvm-tgi
- asr
ports:
- "8888:8888"
environment:
Expand All @@ -130,6 +152,8 @@ services:
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
ipc: host
restart: always
multimodalqna-ui:
Expand Down
3 changes: 3 additions & 0 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ export https_proxy=${your_http_proxy}
export EMBEDDER_PORT=6006
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT/v1/encode"
export MM_EMBEDDING_PORT_MICROSERVICE=6000
export ASR_ENDPOINT=http://$host_ip:7066
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export REDIS_URL="redis://${host_ip}:6379"
export REDIS_HOST=${host_ip}
export INDEX_NAME="mm-rag-redis"
Expand Down
12 changes: 12 additions & 0 deletions MultimodalQnA/docker_image_build/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,15 @@ services:
dockerfile: comps/dataprep/multimodal/redis/langchain/Dockerfile
extends: multimodalqna
image: ${REGISTRY:-opea}/dataprep-multimodal-redis:${TAG:-latest}
whisper-service:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/dependency/Dockerfile
extends: multimodalqna
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
asr:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/Dockerfile
extends: multimodalqna
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
Loading
Loading