Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] GraphRag dataprep/retriever error #1509

Open
2 of 8 tasks
lianhao opened this issue Feb 10, 2025 · 1 comment
Open
2 of 8 tasks

[Bug] GraphRag dataprep/retriever error #1509

lianhao opened this issue Feb 10, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@lianhao
Copy link
Collaborator

lianhao commented Feb 10, 2025

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source
  • Other

Deploy method

  • Docker
  • Docker Compose
  • Kubernetes Helm Charts
  • Kubernetes GMC
  • Other

Running nodes

Single Node

What's the version?

docker compose file version: git commit 388d3eb
opea/dataprep:latest image digest id: 0215f9d, "Created": "2025-01-30T16:30:01.965528862Z"
opea/retriever:latest image digest id: 63e67cf, "Created": "2025-01-30T16:28:22.444907191Z"

Description

Following the GraghRag README to deploy the GraphRag, met the following issues:

  1. uploading the nke-10k-2023.pdf seems not working even those the return HTTP status code is 200, based on the dataprep container log.

  2. trying to retrieve the uploaded document just fails with "Internal Server Error"

Reproduce steps

  1. Following the GraghRag README to deploy the GraphRag
cd GraphRAG/docker_compose/intel/hpu/gaudi
source set_env.sh
docker compose -f compose.yaml up -d
  1. Upload the document, and then check the dataprep container log. See the attached container logs in the following sections and found some errors:
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf
curl -X POST "http://localhost:6004/v1/dataprep/ingest"     -H "Content-Type: multipart/form-data"     -F "files=@./nke-10k-2023.pdf"
docker compose logs dataprep-neo4j-llamaindex
  1. Retrieve the document, and hit the "Internal Server Error" failure
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") 
curl http://localhost:7000/v1/retrieval -X POST -H 'Content-Type: application/json' -d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding}}"
docker compose logs 

Raw log

docker compose logs dataprep-neo4j-llamaindex
dataprep-neo4j-server  | /usr/local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing LLMChain from langchain root module is no longer supported. Please use langchain.chains.LLMChain instead.
dataprep-neo4j-server  |   warnings.warn(
dataprep-neo4j-server  | /usr/local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. Please use langchain_core.prompts.PromptTemplate instead.
dataprep-neo4j-server  |   warnings.warn(
dataprep-neo4j-server  | [2025-02-10 08:06:46,154] [    INFO] - opea_dataprep_neo4j_llamaindex - NO OpenAI API Key. TGI/VLLM/TEI endpoints will be used.
dataprep-neo4j-server  | [2025-02-10 08:06:47,418] [    INFO] - opea_dataprep_neo4j_llamaindex - Time taken to initialize: 1.2641932964324951
dataprep-neo4j-server  | [2025-02-10 08:06:47,420] [    INFO] - Base service - CORS is enabled.
dataprep-neo4j-server  | [2025-02-10 08:06:47,421] [    INFO] - Base service - Setting up HTTP server
dataprep-neo4j-server  | [2025-02-10 08:06:47,422] [    INFO] - Base service - Uvicorn server setup on port 5000
dataprep-neo4j-server  | INFO:     Waiting for application startup.
dataprep-neo4j-server  | INFO:     Application startup complete.
dataprep-neo4j-server  | INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
dataprep-neo4j-server  | [2025-02-10 08:06:47,433] [    INFO] - Base service - HTTP server setup successful
dataprep-neo4j-server  | [2025-02-10 08:06:47,437] [    INFO] - opea_dataprep_microservice - OPEA Dataprep Microservice is starting...
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_microservice - [ ingest ] files:[UploadFile(filename='nke-10k-2023.pdf', size=2397936, headers=Headers({'content-disposition': 'form-data; name="files"; filename="nke-10k-2023.pdf"', 'content-type': 'application/pdf'}))]
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_microservice - [ ingest ] link_list:None
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_loader - [ dataprep loader ] ingest files
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_neo4j_llamaindex - files:[UploadFile(filename='nke-10k-2023.pdf', size=2397936, headers=Headers({'content-disposition': 'form-data; name="files"; filename="nke-10k-2023.pdf"', 'content-type': 'application/pdf'}))]
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_neo4j_llamaindex - link_list:None
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_neo4j_llamaindex - skip_ingestion:annotation=NoneType required=False default=False json_schema_extra={}
dataprep-neo4j-server  | [2025-02-10 08:08:58,641] [    INFO] - opea_dataprep_neo4j_llamaindex - NO OpenAI API Key. TGI/VLLM/TEI endpoints will be used.
dataprep-neo4j-server  | [2025-02-10 08:08:58,661] [    INFO] - opea_dataprep_neo4j_llamaindex - Time taken to initialize: 0.020206928253173828
dataprep-neo4j-server  | [2025-02-10 08:08:58,975] [   ERROR] - opea_dataprep_neo4j_llamaindex - Error building communities: EmptyNetworkError
dataprep-neo4j-server  | [2025-02-10 08:08:58,976] [   ERROR] - opea_dataprep_neo4j_llamaindex - Error building communities: EmptyNetworkError
dataprep-neo4j-server  | Traceback (most recent call last):
dataprep-neo4j-server  |   File "/home/user/comps/dataprep/src/integrations/neo4j_llamaindex.py", line 656, in build_communities
dataprep-neo4j-server  |     await index.property_graph_store.build_communities()
dataprep-neo4j-server  |   File "/home/user/comps/dataprep/src/integrations/neo4j_llamaindex.py", line 130, in build_communities
dataprep-neo4j-server  |     community_hierarchical_clusters = hierarchical_leiden(nx_graph, max_cluster_size=self.max_cluster_size)
dataprep-neo4j-server  |                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataprep-neo4j-server  |   File "<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x7fe02644dda0>", line 304, in hierarchical_leiden
dataprep-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/graspologic/partition/leiden.py", line 588, in hierarchical_leiden
dataprep-neo4j-server  |     hierarchical_clusters_native = gn.hierarchical_leiden(
dataprep-neo4j-server  |                                    ^^^^^^^^^^^^^^^^^^^^^^^
dataprep-neo4j-server  | leiden.EmptyNetworkError: EmptyNetworkError
dataprep-neo4j-server  |
dataprep-neo4j-server  | [2025-02-10 08:08:58,976] [    INFO] - opea_dataprep_neo4j_llamaindex - {'status': 200, 'message': 'Data preparation succeeded'}
dataprep-neo4j-server  | [2025-02-10 08:08:58,976] [    INFO] - opea_dataprep_microservice - [ ingest ] Output generated: {'status': 200, 'message': 'Data preparation succeeded'}


docker compose logs retriever-neo4j-llamaindex
... ...
retriever-neo4j-server  | [2025-02-10 08:09:05,522] [   ERROR] - opea_retrievers_microservice - [ retrieval ] Error during retrieval invocation: 'EmbedDoc' object has no attribute 'messages'
retriever-neo4j-server  | ERROR:    Exception in ASGI application
retriever-neo4j-server  | Traceback (most recent call last):
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
retriever-neo4j-server  |     result = await app(  # type: ignore[func-returns-value]
retriever-neo4j-server  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
retriever-neo4j-server  |     return await self.app(scope, receive, send)
retriever-neo4j-server  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
retriever-neo4j-server  |     await super().__call__(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 112, in __call__
retriever-neo4j-server  |     await self.middleware_stack(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
retriever-neo4j-server  |     raise exc
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
retriever-neo4j-server  |     await self.app(scope, receive, _send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
retriever-neo4j-server  |     raise exc
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
retriever-neo4j-server  |     await self.app(scope, receive, send_wrapper)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
retriever-neo4j-server  |     await self.app(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
retriever-neo4j-server  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
retriever-neo4j-server  |     raise exc
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
retriever-neo4j-server  |     await app(scope, receive, sender)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
retriever-neo4j-server  |     await self.middleware_stack(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
retriever-neo4j-server  |     await route.handle(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
retriever-neo4j-server  |     await self.app(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
retriever-neo4j-server  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
retriever-neo4j-server  |     raise exc
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
retriever-neo4j-server  |     await app(scope, receive, sender)
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
retriever-neo4j-server  |     response = await f(request)
retriever-neo4j-server  |                ^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
retriever-neo4j-server  |     raw_response = await run_endpoint_function(
retriever-neo4j-server  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
retriever-neo4j-server  |     return await dependant.call(**values)
retriever-neo4j-server  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/home/user/comps/retrievers/src/opea_retrievers_microservice.py", line 71, in ingest_files
retriever-neo4j-server  |     response = await loader.invoke(input)
retriever-neo4j-server  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/home/user/comps/cores/common/component.py", line 163, in invoke
retriever-neo4j-server  |     return await self.component.invoke(*args, **kwargs)
retriever-neo4j-server  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/home/user/comps/retrievers/src/integrations/neo4j.py", line 308, in invoke
retriever-neo4j-server  |     if isinstance(input.messages, str):
retriever-neo4j-server  |                   ^^^^^^^^^^^^^^
retriever-neo4j-server  |   File "/usr/local/lib/python3.11/site-packages/docarray/base_doc/doc.py", line 260, in __getattr__
retriever-neo4j-server  |     return super().__getattribute__(item)
retriever-neo4j-server  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
retriever-neo4j-server  | AttributeError: 'EmbedDoc' object has no attribute 'messages'

Attachments

No response

@xiguiw
Copy link
Collaborator

xiguiw commented Feb 19, 2025

@rbrugaro

Submitted PR to fix this issue:
opea-project/GenAIComps#1292
#1567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants