Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: <Something wrong with fnllm、tenacity> #1666

Open
2 of 3 tasks
liuchuan01 opened this issue Jan 29, 2025 · 3 comments
Open
2 of 3 tasks

[Bug]: <Something wrong with fnllm、tenacity> #1666

liuchuan01 opened this issue Jan 29, 2025 · 3 comments
Labels
awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@liuchuan01
Copy link

liuchuan01 commented Jan 29, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Title: Error in tenacity KeyError: 'idle_for' Something wrong with fnllm、tenacity

Description:

Hello,I hope this message finds you well.

I am build index ,when step into extract_graph flows ,the entities and edges are extracted correctly, but an error occurs during the summarization step in the summarize_descriptions_with_llm function.

Here is the relevant code snippet:

# graphrag/index/operations/summarize_descriptions/description_summary_extractor.py/summarize_descriptions_with_llm
async def _summarize_descriptions_with_llm(
    self, id: str | tuple[str, str] | list[str], descriptions: list[str]
):
    """Summarize descriptions using the LLM."""
    response = await self._llm(
        self._summarization_prompt.format(**{
            self._entity_name_key: json.dumps(id, ensure_ascii=False),
            self._input_descriptions_key: json.dumps(
                sorted(descriptions), ensure_ascii=False
            ),
        }),
        name="summarize",
        model_parameters={"max_tokens": self._max_summary_length},
    )
    # Calculate result
    return str(response.output.content)

I have not modified this code.

However, I noticed a similar usage in graphrag/index/operations/extract_entities/graph_extractor.py:

response = await self._llm(
    CONTINUE_PROMPT,
    name=f"extract-continuation-{i}",
    history=response.history,
)

This part works correctly, so I tried removing model_parameters={"max_tokens": self._max_summary_length}, but the error still persists.

I am using the deepseek-chat model, but it doesn't seem to be the cause. I suspect there might be something wrong with fnllm、tenacity

Could you please help me identify and resolve this problem?

Thank you !

Steps to reproduce

just init normal

python -m graphrag init --root ./liuchuan_use/test_first 
python -m graphrag index --root ./liuchuan_use/test_first

Expected Behavior

summarization success

GraphRAG Config Used

### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

models:
  default_chat_model:
    api_key: ${GRAPHRAG_LLM_API_KEY} # set this in the generated .env file
    type: openai_chat # or azure_openai_chat
    model: deepseek-chat
    encoding_model: cl100k_base
    model_supports_json: true # recommended if this is available for your model.
    parallelization_num_threads: 50
    parallelization_stagger: 0.3
    async_mode: threaded # or asyncio
    # audience: "https://cognitiveservices.azure.com/.default"
    api_base: https://api.deepseek.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
  default_embedding_model:
    api_key: ${GRAPHRAG_EMBED_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-v3
    encoding_model: cl100k_base
    parallelization_num_threads: 50
    parallelization_stagger: 0.3
    async_mode: threaded # or asyncio
    api_base: https://dashscope.aliyuncs.com/compatible-mode/v1/embeddings
    # api_version: 2024-02-15-preview
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>


vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output/lancedb
    container_name: default
    overwrite: True

embeddings:
  model_id: default_embedding_model

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$$"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "cache"

reporting:
  type: file # or console, blob
  base_dir: "logs"

output:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "output"

## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_output:
  # type: file # or blob
  # base_dir: "update_output"

### Workflow settings ###

entity_extraction:
  model_id: default_chat_model
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1
  max_input_length: 8000

summarize_descriptions:
  model_id: default_chat_model
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500
  max_input_length: 8000

claim_extraction:
  enabled: false
  model_id: default_chat_model
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  model_id: default_chat_model
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes

umap:
  enabled: true # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: false
  embeddings: false
  transient: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  prompt: "prompts/basic_search_system_prompt.txt"

Logs and screenshots

error:

22:11:59,912 graphrag.callbacks.file_workflow_callbacks INFO Error Invoking LLM details={'prompt': '\nYou are a helpful assistant responsible for generating a comprehensive summary of the data provided below.\nGiven one or two entities, and a list of descriptions, all related to the same entity or group of entities.\nPlease concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.\nIf the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.\nMake sure it is written in third person, and include the entity names so we have the full context.\n\n#######\n-Data-\nEntities: "花果山"\nDescription List: ["A mountain in the sea near 傲来国, described as the ancestral vein of ten continents and three islands, and the birthplace of the Stone Monkey", "A mountain known as the home of the monkeys and the location of the Water Curtain Cave"]\n#######\nOutput:\n', 'kwargs': {'name': 'summarize'}}
22:11:59,914 graphrag.index.run.run_pipeline ERROR error running workflow extract_graph
Traceback (most recent call last):
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/run/run_pipeline.py", line 162, in _run_pipeline
    result = await fn(
             ^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/workflows/extract_graph.py", line 53, in run_workflow
    base_entity_nodes, base_relationship_edges = await extract_graph(
                                                 ^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/flows/extract_graph.py", line 57, in extract_graph
    entity_summaries, relationship_summaries = await summarize_descriptions(
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/summarize_descriptions.py", line 146, in summarize_descriptions
    return await get_summarized(entities_df, relationships_df, semaphore)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/summarize_descriptions.py", line 96, in get_summarized
    node_results = await asyncio.gather(*node_futures)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/asyncio/tasks.py", line 349, in __wakeup
    future.result()
  File "/usr/local/anaconda3/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/summarize_descriptions.py", line 138, in do_summarize_descriptions
    results = await strategy_exec(
              ^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/graph_intelligence_strategy.py", line 36, in run_graph_intelligence
    return await run_summarize_descriptions(llm, id, descriptions, callbacks, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/graph_intelligence_strategy.py", line 67, in run_summarize_descriptions
    result = await extractor(id=id, descriptions=descriptions)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/description_summary_extractor.py", line 73, in __call__
    result = await self._summarize_descriptions(id, descriptions)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/description_summary_extractor.py", line 110, in _summarize_descriptions
    result = await self._summarize_descriptions_with_llm(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/CodeEnv/dev2learn/graphrag/graphrag/index/operations/summarize_descriptions/description_summary_extractor.py", line 135, in _summarize_descriptions_with_llm
    response = await self._llm(
               ^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/openai/llm/chat.py", line 83, in __call__
    return await self._text_chat_llm(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/openai/llm/features/tools_parsing.py", line 120, in __call__
    return await self._delegate(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/base/base.py", line 112, in __call__
    return await self._invoke(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/base/base.py", line 128, in _invoke
    return await self._decorated_target(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/services/json.py", line 71, in invoke
    return await delegate(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/services/retryer.py", line 109, in invoke
    result = await execute_with_retry()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/fnllm/services/retryer.py", line 93, in execute_with_retry
    async for a in AsyncRetrying(
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/liuchuan/Library/Caches/pypoetry/virtualenvs/graphrag-qU9pFR3c-py3.11/lib/python3.11/site-packages/tenacity/__init__.py", line 428, in next_action
    self.statistics["idle_for"] += sleep
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'idle_for'
22:11:59,921 graphrag.callbacks.file_workflow_callbacks INFO Error running pipeline! details=None
22:11:59,953 graphrag.cli.index ERROR Errors occurred during the pipeline run, see logs for more details.

Additional Information

  • GraphRAG Version: 0f743ae
  • Operating System: MacOS Intel
  • Python Version: 3.11
  • Related Issues: null
@liuchuan01 liuchuan01 added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jan 29, 2025
@natoverse
Copy link
Collaborator

We've been working through a number of issues since adopting fnllm for API call management. 2.0.0 was just released, which we believe resolves these issues.

@natoverse natoverse added the awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response label Feb 26, 2025
Copy link

github-actions bot commented Mar 5, 2025

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

@github-actions github-actions bot added the stale Used by auto-resolve bot to flag inactive issues label Mar 5, 2025
@shoukaiseki
Copy link

I also encountered the same problem
graphrag==2.0.0

@github-actions github-actions bot removed the stale Used by auto-resolve bot to flag inactive issues label Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

3 participants