[Issue]: <title> ZeroDivisionError: Weights sum to zero, can't be normalized #619

prasantpoudel · 2024-07-19T04:39:02Z

Describe the issue

When I run the query using local scope I got the error of ZeroDivisionError: Weights sum to zero, can't be normalized. But for the Global scope it worked correctly. If any one have the Idea please give the solution.

python3 -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"


INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_chat", 'model': 'mistral:7b', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
creating embedding llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/__main__.py", line 75, in <module>
    run_local_search(
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
                            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/llm/oai/embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

The text was updated successfully, but these errors were encountered:

Nuclear6 · 2024-07-19T12:29:58Z

It should be that something went wrong in your index phase. You can look at the logs in the index phase.

sebnapi · 2024-07-19T12:59:07Z

Yes this is due to your locally run embedding model, not returning the weights in a correct format. OpenAI uses internally base64 encoded floats, and most other models will return floats as numbers.

I've hacked the encoding_format into this piece of code to make local search work:

def map_query_to_entities(
    query: str,
    text_embedding_vectorstore: BaseVectorStore,
    text_embedder: BaseTextEmbedding,
    all_entities: list[Entity],
    embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
    include_entity_names: list[str] | None = None,
    exclude_entity_names: list[str] | None = None,
    k: int = 10,
    oversample_scaler: int = 2,
) -> list[Entity]:
    """Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
    if include_entity_names is None:
        include_entity_names = []
    if exclude_entity_names is None:
        exclude_entity_names = []
    matched_entities = []
    if query != "":
        # get entities with highest semantic similarity to query
        # oversample to account for excluded entities
        search_results = text_embedding_vectorstore.similarity_search_by_text(
            text=query,
            text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
            k=k * oversample_scaler,
        )
        for result in search_results:
            matched = get_entity_by_key(
                entities=all_entities,
                key=embedding_vectorstore_key,
                value=result.document.id,
            )
            if matched:
                matched_entities.append(matched)
    else:
        all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
        matched_entities = all_entities[:k]

    # filter out excluded entities
    if exclude_entity_names:
        matched_entities = [
            entity
            for entity in matched_entities
            if entity.title not in exclude_entity_names
        ]

    # add entities in the include_entity list
    included_entities = []
    for entity_name in include_entity_names:
        included_entities.extend(get_entity_by_name(all_entities, entity_name))
    return included_entities + matched_entities

Anthonyfhd · 2024-07-20T05:38:18Z

是的，这是由于您本地运行的嵌入模型未以正确的格式返回权重。OpenAI 使用内部 base64 编码的浮点数，而大多数其他模型将以数字形式返回浮点数。

我把 encoding_format 修改成了这段代码，以使本地搜索能够正常工作：

def map_query_to_entities(
    query: str,
    text_embedding_vectorstore: BaseVectorStore,
    text_embedder: BaseTextEmbedding,
    all_entities: list[Entity],
    embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
    include_entity_names: list[str] | None = None,
    exclude_entity_names: list[str] | None = None,
    k: int = 10,
    oversample_scaler: int = 2,
) -> list[Entity]:
    """Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
    if include_entity_names is None:
        include_entity_names = []
    if exclude_entity_names is None:
        exclude_entity_names = []
    matched_entities = []
    if query != "":
        # get entities with highest semantic similarity to query
        # oversample to account for excluded entities
        search_results = text_embedding_vectorstore.similarity_search_by_text(
            text=query,
            text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
            k=k * oversample_scaler,
        )
        for result in search_results:
            matched = get_entity_by_key(
                entities=all_entities,
                key=embedding_vectorstore_key,
                value=result.document.id,
            )
            if matched:
                matched_entities.append(matched)
    else:
        all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
        matched_entities = all_entities[:k]

    # filter out excluded entities
    if exclude_entity_names:
        matched_entities = [
            entity
            for entity in matched_entities
            if entity.title not in exclude_entity_names
        ]

    # add entities in the include_entity list
    included_entities = []
    for entity_name in include_entity_names:
        included_entities.extend(get_entity_by_name(all_entities, entity_name))
    return included_entities + matched_entities

这好像改了还是不工作

johmyzhang · 2024-07-20T13:09:50Z

It's because you're using local model.
If you are using Ollama (just like me), you might see this answer:
#345 (comment)
Works perfectly for me...

natoverse · 2024-07-22T20:32:11Z

Consolidating alternate model issues here: #657

AymaneHan1 · 2024-09-06T08:48:58Z

Hello, currently running through the same problem, I am using an azure openai instance

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    deployment_name="ada-small-emb-graphrag",
    model="text-embedding-ada-002",
    api_base="https://xxx-oai.openai.azure.com/",
)

text_embedder.embed("hello world")

This returns the error
ZeroDivisionError: Weights sum to zero, can't be normalized
I have added the float encoding in teh source code but still do not work

text_embedder=lambda t: text_embedder.embed(t, encoding_format="float")
Any ideas why is still not working? Thanks

dantenull · 2024-10-13T14:35:46Z

I also encountered this situation, but because I did not connect to openai, I checked the api_base and api_key, and there was no problem.

ArwaALyahyai · 2024-11-03T11:37:37Z

Yes this is due to your locally run embedding model, not returning the weights in a correct format. OpenAI uses internally base64 encoded floats, and most other models will return floats as numbers.

I've hacked the encoding_format into this piece of code to make local search work:

def map_query_to_entities(
query: str,
text_embedding_vectorstore: BaseVectorStore,
text_embedder: BaseTextEmbedding,
all_entities: list[Entity],
embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
include_entity_names: list[str] | None = None,
exclude_entity_names: list[str] | None = None,
k: int = 10,
oversample_scaler: int = 2,
) -> list[Entity]:
"""Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
if include_entity_names is None:
include_entity_names = []
if exclude_entity_names is None:
exclude_entity_names = []
matched_entities = []
if query != "":
# get entities with highest semantic similarity to query
# oversample to account for excluded entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
text=query,
text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
k=k * oversample_scaler,
)
for result in search_results:
matched = get_entity_by_key(
entities=all_entities,
key=embedding_vectorstore_key,
value=result.document.id,
)
if matched:
matched_entities.append(matched)
else:
all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
matched_entities = all_entities[:k]
# filter out excluded entities
if exclude_entity_names:
    matched_entities = [
        entity
        for entity in matched_entities
        if entity.title not in exclude_entity_names
    ]

# add entities in the include_entity list
included_entities = []
for entity_name in include_entity_names:
    included_entities.extend(get_entity_by_name(all_entities, entity_name))
return included_entities + matched_entities

where I should place this code?

lawyinking · 2024-11-14T06:04:46Z

Describe the issue

When I run the query using local scope I got the error of ZeroDivisionError: Weights sum to zero, can't be normalized. But for the Global scope it worked correctly. If any one have the Idea please give the solution.

python3 -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"


INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_chat", 'model': 'mistral:7b', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
creating embedding llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/__main__.py", line 75, in <module>
    run_local_search(
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
                            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/llm/oai/embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

This may be caused by an invalid Api_key

shawn-maxiao · 2025-02-24T14:21:58Z

如果API key 非法导致 embedding的模型返回的不对，也会引起这个错误。

我有一次更换了settings.xml里 llm的api base，我也就对应的修改了.env里的 api key；但是这个.env里的api key不仅仅llm 的 API base 在用，调用embedding的api base的时候，api key也是用这个。这就导致.env里新的api key其实不适用于embedding的 api base（这个厂商的api）；也就是我忘记把 embeeding的 api base 也一起修改了。

prasantpoudel added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Jul 19, 2024

natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024

natoverse added community_support Issue handled by community members and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 22, 2024

balezeauquentin mentioned this issue Sep 23, 2024

[Bug]: Fail to run with a local LLM (Ollama) #1186

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: <title> ZeroDivisionError: Weights sum to zero, can't be normalized #619

[Issue]: <title> ZeroDivisionError: Weights sum to zero, can't be normalized #619

prasantpoudel commented Jul 19, 2024

Nuclear6 commented Jul 19, 2024

sebnapi commented Jul 19, 2024 •

edited

Loading

Anthonyfhd commented Jul 20, 2024

johmyzhang commented Jul 20, 2024 •

edited

Loading

natoverse commented Jul 22, 2024

AymaneHan1 commented Sep 6, 2024

dantenull commented Oct 13, 2024 •

edited

Loading

ArwaALyahyai commented Nov 3, 2024

lawyinking commented Nov 14, 2024

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

shawn-maxiao commented Feb 24, 2025 •

edited

Loading

[Issue]: <title> ZeroDivisionError: Weights sum to zero, can't be normalized #619

[Issue]: <title> ZeroDivisionError: Weights sum to zero, can't be normalized #619

Comments

prasantpoudel commented Jul 19, 2024

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

Nuclear6 commented Jul 19, 2024

sebnapi commented Jul 19, 2024 • edited Loading

Anthonyfhd commented Jul 20, 2024

johmyzhang commented Jul 20, 2024 • edited Loading

natoverse commented Jul 22, 2024

AymaneHan1 commented Sep 6, 2024

dantenull commented Oct 13, 2024 • edited Loading

ArwaALyahyai commented Nov 3, 2024

lawyinking commented Nov 14, 2024

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

shawn-maxiao commented Feb 24, 2025 • edited Loading

sebnapi commented Jul 19, 2024 •

edited

Loading

johmyzhang commented Jul 20, 2024 •

edited

Loading

dantenull commented Oct 13, 2024 •

edited

Loading

shawn-maxiao commented Feb 24, 2025 •

edited

Loading