Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for multimodal #453

Merged
merged 19 commits into from
Nov 29, 2024
Merged

feat: add support for multimodal #453

merged 19 commits into from
Nov 29, 2024

Conversation

leehuwuj
Copy link
Collaborator

@leehuwuj leehuwuj commented Nov 27, 2024

Summary by CodeRabbit

  • New Features

    • Introduced MultiModalSynthesizer for enhanced multi-modal query processing, integrating both text and image nodes.
    • Added asynchronous synthesis capabilities with the asynthesize method.
    • Enhanced settings to support multi-modal language models based on specific model configurations.
    • Added support for multimodal indexes with integration into LlamaCloud for versatile data management.
  • Bug Fixes

    • Improved error handling for model support initialization.
  • Documentation

    • Updated import statements and method signatures for clarity and maintainability.

Copy link

changeset-bot bot commented Nov 27, 2024

⚠️ No Changeset found

Latest commit: af02674

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

coderabbitai bot commented Nov 27, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce a new class MultiModalSynthesizer that extends BaseSynthesizer, designed for summarizing text nodes and utilizing a multi-modal LLM for generating responses. An asynchronous method asynthesize is added to process both image and text nodes. The synthesize method is also introduced for synchronous operation. The create_query_engine function is modified to support multi-modal functionality, checking for a retrieval_mode and integrating the RetrieverQueryEngine. Additionally, the settings.py file is updated to include a _multi_modal_llm variable and enhance the initialization of multi-modal capabilities based on the model name.

Changes

File Path Change Summary
templates/components/engines/python/agent/tools/query_engine.py - Added MultiModalSynthesizer class.
- Added asynthesize and synthesize methods.
- Updated create_query_engine method signature.
templates/components/settings/python/settings.py - Added variable _multi_modal_llm.
- Added get_multi_modal_llm function.
- Updated init_openai method for model initialization.
.changeset/blue-hornets-boil.md - Added patch "create-llama" for multi-modal index support with LlamaCloud.

Possibly related PRs

  • refactor: make components resuable for chat llm #202: The changes in this PR focus on enhancing the reusability of components for chat LLMs, which may relate to the multi-modal capabilities introduced in the main PR, as both aim to improve the functionality of LLM applications.
  • feat: Use selected LlamaCloud index in multi-agent template #350: This PR implements functionality to utilize a selected LlamaCloud index within a multi-agent template, which aligns with the multi-modal enhancements in the main PR that also involve integrating different data types for improved response generation.
  • Don't need to run generate script for LlamaCloud #352: This PR specifies that there is no longer a requirement to execute the generate script for LlamaCloud, simplifying processes that may be related to the multi-modal functionalities introduced in the main PR.
  • Enhance data type #378: The changes in this PR focus on enhancing data types, which could be relevant to the structured Response object introduced in the main PR that handles synthesized responses.
  • Add support for local models via Hugging Face #414: This PR adds support for local models via Hugging Face, which could complement the multi-modal capabilities introduced in the main PR by expanding the types of models that can be integrated into the system.

Poem

In the garden of code, where the queries bloom,
A new engine emerges, dispelling the gloom.
With text and images, it weaves a fine thread,
Multi-modal magic, where all paths are led.
Hopping through logic, with joy we shall sing,
For every new feature, the rabbit takes wing! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@leehuwuj leehuwuj marked this pull request as ready for review November 29, 2024 02:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
templates/components/settings/python/settings.py (2)

78-81: Improve multi-modal LLM initialization with documentation.

The multi-modal initialization could be more explicit and better documented:

+    # Initialize multi-modal LLM if the model supports it, otherwise keep it as None
     if model_name in GPT4V_MODELS:
         global multi_modal_llm
         multi_modal_llm = OpenAIMultiModal(model=model_name)
+    else:
+        global multi_modal_llm
+        multi_modal_llm = None

Also consider adding a docstring to the function explaining the side effect of modifying the global multi_modal_llm.


7-8: Consider future multi-modal support for other providers.

The current implementation only handles multi-modal capabilities for OpenAI. As other providers (like Gemini) add multi-modal support, consider creating a more generic pattern for multi-modal initialization across different providers.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between da4505a and 1e647c2.

📒 Files selected for processing (2)
  • templates/components/engines/python/agent/tools/query_engine.py (2 hunks)
  • templates/components/settings/python/settings.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

templates/components/settings/python/settings.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (3)
templates/components/settings/python/settings.py (1)

2-8: LGTM! Clean implementation of multi-modal LLM singleton.

The imports and global variable declaration are well-structured with proper type hints.

templates/components/engines/python/agent/tools/query_engine.py (2)

134-137: ⚠️ Potential issue

Conflict with image node retrieval in 'auto_routed' mode

The comment on line 135 states that image nodes are not supported for auto_routed or chunk retrieval modes. However, kwargs["retrieve_image_nodes"] is set to True when retrieval_mode is 'auto_routed', which could lead to inconsistent behavior.

Consider adjusting the logic to ensure retrieve_image_nodes is only set to True when the retrieval mode supports image nodes. For example:

 if index.__class__.__name__ == "LlamaCloudIndex":
     retrieval_mode = kwargs.get("retrieval_mode")
     if retrieval_mode is None:
         kwargs["retrieval_mode"] = "auto_routed"
-        if multi_modal_llm:
+    if multi_modal_llm and retrieval_mode != "auto_routed":
         # Note: image nodes are not supported for auto_routed or chunk retrieval mode
         kwargs["retrieve_image_nodes"] = True
         return MultiModalQueryEngine(
             retriever=index.as_retriever(**kwargs),
             multi_modal_llm=multi_modal_llm,
         )

24-35: ⚠️ Potential issue

Initialize self._multi_modal_llm in the __init__ method

The methods synthesize and asynthesize use self._multi_modal_llm, but it is not initialized in the __init__ method. This could lead to an AttributeError at runtime.

Apply this diff to fix the issue:

 def __init__(
     self,
     text_synthesizer: Optional[BaseSynthesizer] = None,
+    multi_modal_llm=None,
     *args,
     **kwargs,
 ):
     super().__init__(*args, **kwargs)
+    self._multi_modal_llm = multi_modal_llm
     # Use a response synthesizer for text nodes summarization
     self._text_synthesizer = text_synthesizer or get_response_synthesizer(
         streaming=False,
         response_mode=ResponseMode.TREE_SUMMARIZE,
     )

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)

19-22: Enhance class docstring for better documentation.

The current docstring could be more descriptive. Consider adding:

  • Purpose of the class and when to use it
  • Key features and capabilities
  • Example usage
 class MultiModalQueryEngine(SimpleMultiModalQueryEngine):
     """
-    A multi-modal query engine that splits the retrieval results into chunks then summarizes each chunk to reduce the number of tokens in the response.
+    A multi-modal query engine that processes both text and image nodes to generate comprehensive responses.
+    
+    This engine extends SimpleMultiModalQueryEngine by adding text node summarization capabilities
+    to handle large text contexts efficiently. It splits retrieval results into chunks and
+    summarizes them to reduce token usage while maintaining context quality.
+    
+    Key features:
+    - Efficient text node summarization using tree summarize mode
+    - Integrated handling of both text and image nodes
+    - Support for both synchronous and asynchronous operations
+    
+    Example:
+        engine = MultiModalQueryEngine(
+            retriever=index.as_retriever(),
+            multi_modal_llm=multi_modal_llm
+        )
+        response = engine.query("Describe the image and relate it to the text context")
     """

51-115: Reduce code duplication in synthesis methods.

The synthesize and asynthesize methods share almost identical code. Consider extracting the common logic into a private helper method.

+    async def _prepare_synthesis(
+        self,
+        query_bundle: QueryBundle,
+        nodes: List[NodeWithScore],
+    ) -> tuple[str, List[NodeWithScore], List[NodeWithScore]]:
+        image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
+        text_response = self._summarize_text_nodes(
+            query_bundle=query_bundle,
+            nodes=text_nodes,
+        )
+        fmt_prompt = self._text_qa_template.format(
+            context_str=text_response,
+            query_str=query_bundle.query_str,
+        )
+        return fmt_prompt, image_nodes, text_nodes
+
     def synthesize(
         self,
         query_bundle: QueryBundle,
         nodes: List[NodeWithScore],
         additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
     ) -> RESPONSE_TYPE:
-        image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
-        text_response = self._summarize_text_nodes(
-            query_bundle=query_bundle,
-            nodes=text_nodes,
-        )
-        fmt_prompt = self._text_qa_template.format(
-            context_str=text_response,
-            query_str=query_bundle.query_str,
-        )
+        fmt_prompt, image_nodes, text_nodes = self._prepare_synthesis(query_bundle, nodes)
         llm_response = self._multi_modal_llm.complete(
             prompt=fmt_prompt,
             image_documents=[
                 image_node.node
                 for image_node in image_nodes
                 if isinstance(image_node.node, ImageNode)
             ],
         )
         return Response(
             response=str(llm_response),
             source_nodes=nodes,
             metadata={"text_nodes": text_nodes, "image_nodes": image_nodes},
         )

     async def asynthesize(
         self,
         query_bundle: QueryBundle,
         nodes: List[NodeWithScore],
         additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
     ) -> RESPONSE_TYPE:
-        image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
-        text_response = self._summarize_text_nodes(
-            query_bundle=query_bundle,
-            nodes=text_nodes,
-        )
-        fmt_prompt = self._text_qa_template.format(
-            context_str=text_response,
-            query_str=query_bundle.query_str,
-        )
+        fmt_prompt, image_nodes, text_nodes = await self._prepare_synthesis(query_bundle, nodes)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1e647c2 and bc8df66.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)

24-35: ⚠️ Potential issue

Initialize missing _text_qa_template in constructor.

The _text_qa_template is used in the synthesize and asynthesize methods but is not initialized in the constructor. This could lead to runtime errors.


131-141: Clarify image node support limitations and verify retrieval modes.

The comment about image node support limitations could be more explicit. Also, we should verify the retrieval modes that don't support image nodes.

Consider updating the comment to be more explicit:

-            # Note: image nodes are not supported for auto_routed or chunk retrieval mode
+            # Enable image node retrieval. Note: While we set this to True,
+            # image nodes may not be effectively retrieved in 'auto_routed' or
+            # 'chunk' retrieval modes due to underlying implementation limitations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
templates/components/engines/python/agent/tools/query_engine.py (1)

89-102: Simplify conditional logic and update docstring

The nested conditions could be simplified and the docstring should be updated to reflect new parameters:

 def create_query_engine(index, **kwargs) -> BaseQueryEngine:
     """
     Create a query engine for the given index.
 
     Args:
         index: The index to create a query engine for.
-        params (optional): Additional parameters for the query engine, e.g: similarity_top_k
+        **kwargs: Additional parameters including:
+            - retrieval_mode: The retrieval mode to use (default: "auto_routed" for LlamaCloudIndex)
+            - similarity_top_k: Number of top results to retrieve
+            - filters: Query filters to apply
     """
     # ... rest of the code ...
-    if index.__class__.__name__ == "LlamaCloudIndex":
-        retrieval_mode = kwargs.get("retrieval_mode")
-        if retrieval_mode is None:
-            kwargs["retrieval_mode"] = "auto_routed"
-            mm_model = multi_modal_llm.get()
-            if mm_model:
-                kwargs["retrieve_image_nodes"] = True
-                return RetrieverQueryEngine(
-                    retriever=index.as_retriever(**kwargs),
-                    response_synthesizer=MultiModalSynthesizer(
-                        multimodal_model=mm_model
-                    ),
-                )
+    if index.__class__.__name__ == "LlamaCloudIndex" and kwargs.get("retrieval_mode") is None:
+        kwargs["retrieval_mode"] = "auto_routed"
+        mm_model = multi_modal_llm.get()
+        if mm_model:
+            kwargs["retrieve_image_nodes"] = True
+            return RetrieverQueryEngine(
+                retriever=index.as_retriever(**kwargs),
+                response_synthesizer=MultiModalSynthesizer(
+                    multimodal_model=mm_model
+                ),
+            )
templates/components/settings/python/settings.py (1)

80-82: Consider future multi-modal support for other providers.

As other providers (like Gemini) may support multi-modal capabilities, consider abstracting the multi-modal initialization into a separate function that can be called from each provider's init function.

Example structure:

def init_multi_modal(provider: str, model_name: str) -> Optional[MultiModalLLM]:
    """Initialize multi-modal LLM based on provider and model."""
    if not os.getenv("MODEL"):
        return None
        
    match provider:
        case "openai":
            if model_name in GPT4V_MODELS:
                return OpenAIMultiModal(model=model_name)
        # Add cases for other providers
    return None
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bc8df66 and 88d2918.

📒 Files selected for processing (2)
  • templates/components/engines/python/agent/tools/query_engine.py (1 hunks)
  • templates/components/settings/python/settings.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

templates/components/settings/python/settings.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (6)
templates/components/engines/python/agent/tools/query_engine.py (4)

2-23: LGTM: Imports are well-organized and complete

All necessary dependencies for multi-modal functionality are properly imported.


26-41: LGTM: Class initialization is properly implemented

The class correctly:

  • Extends TreeSummarize
  • Initializes text_qa_template with a default value
  • Stores the multimodal_model

42-48: Remove unused parameter

The additional_source_nodes parameter is not used in the implementation.


93-94: Verify multi_modal_llm initialization

Ensure that multi_modal_llm.get() is properly initialized before use.

✅ Verification successful

Multi-modal LLM initialization is properly handled

The code is correctly implemented:

  • multi_modal_llm is defined as a ContextVar in settings.py
  • It's properly initialized in settings.py when the model is in GPT4V_MODELS
  • The query engine safely checks for the model's existence with if mm_model: before using it
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check multi_modal_llm initialization
# Search for multi_modal_llm initialization in settings.py
rg -A 5 "multi_modal_llm\s*=|def\s+init.*modal" 

# Search for potential usage before initialization
rg "multi_modal_llm\.get\(\)"

Length of output: 782


Script:

#!/bin/bash
# Check for multi_modal_llm class/module definition and initialization
ast-grep --pattern 'class $_ {
  $$$
  get() {
    $$$
  }
  $$$
}'

# Search for multi_modal_llm imports and related code
rg -B 2 -A 2 "multi_modal_llm|multimodal" --type python

Length of output: 158


Script:

#!/bin/bash
# Search for multi_modal_llm imports and related code without file type restriction
rg -B 2 -A 2 "multi_modal_llm|multimodal"

# Find Python files that might contain the implementation
fd "\.py$" --exec grep -l "multi_modal_llm\|multimodal" {} \;

Length of output: 5513

templates/components/settings/python/settings.py (2)

8-10: Improve documentation for the multi_modal_llm variable.

The docstring should clarify that this is a singleton for multi-modal LLM since Settings doesn't support setting MultiModalLLM.

 multi_modal_llm: ContextVar[Optional[MultiModalLLM]] = ContextVar(
-    "multi_modal_llm", default=None
+    "multi_modal_llm",  # Singleton for multi-modal LLM (as `Settings` does not support setting `MultiModalLLM`)
+    default=None
 )

73-73: Fix typo in default model name.

The default model name "gpt-4o-mini" appears to be a typo.

-    model_name = os.getenv("MODEL", "gpt-4o-mini")
+    model_name = os.getenv("MODEL", "gpt-4")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

121-134: Improve logging and model validation

Consider these improvements:

  1. Replace print statement with proper logging
  2. Add validation for multi-modal model capabilities
             mm_model = multi_modal_llm.get()
             if mm_model:
+                if not hasattr(mm_model, 'complete') or not hasattr(mm_model, 'acomplete'):
+                    raise ValueError("Invalid multi-modal model: missing required methods")
                 kwargs["retrieve_image_nodes"] = True
-                print("Using multi-modal model")
+                logger.info("Initialized query engine with multi-modal capabilities")
                 return RetrieverQueryEngine(
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 88d2918 and fa17d39.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (3)
templates/components/engines/python/agent/tools/query_engine.py (3)

2-23: LGTM! Well-organized imports

The imports are properly organized and all are necessary for the multi-modal functionality implementation.


42-48: Remove unused parameter

The additional_source_nodes parameter is not used in either synthesize or asynthesize methods.

     def synthesize(
         self,
         query: QueryTextType,
         nodes: List[NodeWithScore],
-        additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
         **response_kwargs: Any,
     ) -> RESPONSE_TYPE:

Apply the same change to asynthesize method.

Also applies to: 74-80


59-66: Add error handling for LLM operations

Both synchronous and asynchronous LLM calls lack error handling.

+        try:
             llm_response = self._multi_modal_llm.complete(
                 prompt=fmt_prompt,
                 image_documents=[
                     image_node.node
                     for image_node in image_nodes
                     if isinstance(image_node.node, ImageNode)
                 ],
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to generate response: {str(e)}") from e

Apply similar error handling to acomplete call in asynthesize method.

Also applies to: 91-98

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

26-30: Enhance class documentation

While the docstring provides a basic description, it would be helpful to include:

  • Parameters description
  • Usage examples
  • Return type expectations
 class MultiModalSynthesizer(TreeSummarize):
     """
-    A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response.
+    A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response.
+    
+    Args:
+        multimodal_model (Optional[MultiModalLLM]): The multi-modal language model to use
+        text_qa_template (Optional[BasePromptTemplate]): Template for text QA formatting
+    
+    Example:
+        >>> synthesizer = MultiModalSynthesizer(multimodal_model=llm)
+        >>> response = await synthesizer.asynthesize(query, nodes)
+    """
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fa17d39 and 0ffb7ff.

📒 Files selected for processing (2)
  • templates/components/engines/python/agent/tools/query_engine.py (1 hunks)
  • templates/components/settings/python/settings.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

templates/components/settings/python/settings.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (4)
templates/components/engines/python/agent/tools/query_engine.py (1)

31-41: ⚠️ Potential issue

Add model validation and fix template initialization

The initialization needs two improvements:

  1. Validate the multi-modal model parameter
  2. Fix the template initialization as noted in the previous review
 def __init__(
     self,
     multimodal_model: Optional[MultiModalLLM] = None,
     text_qa_template: Optional[BasePromptTemplate] = None,
     *args,
     **kwargs,
 ):
     super().__init__(*args, **kwargs)
+    if multimodal_model is None:
+        raise ValueError("multimodal_model is required")
     self._multi_modal_llm = multimodal_model
-    self._text_qa_template = text_qa_template or DEFAULT_TREE_SUMMARIZE_PROMPT_SEL
+    self._text_qa_template = text_qa_template or DEFAULT_TREE_SUMMARIZE_PROMPT_SEL.get_prompt()

Likely invalid or redundant comment.

templates/components/settings/python/settings.py (3)

7-9: Improve documentation for global variable.

The comment should better explain why a global variable is needed instead of using the Settings class.

-# `Settings` does not support setting `MultiModalLLM`
-# so we use a global variable to store it
+# Singleton for multi-modal LLM (as `Settings` does not support setting `MultiModalLLM`)

83-85: Improve multi-modal initialization logic.

The multi-modal initialization should only occur if the model is explicitly set through the environment variable, not just based on model name matching.

-    if model_name in GPT4V_MODELS:
+    if os.getenv("MODEL") and model_name in GPT4V_MODELS:
         global _multi_modal_llm
         _multi_modal_llm = OpenAIMultiModal(model=model_name)

76-76: Fix typo in default model name.

The default model name "gpt-4o-mini" appears to be a typo.

-    model_name = os.getenv("MODEL", "gpt-4o-mini")
+    model_name = os.getenv("MODEL", "gpt-4")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0ffb7ff and b9d336c.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)

1-23: LGTM! Well-organized imports

The imports are properly organized and include all necessary dependencies for multi-modal functionality.


42-72: ⚠️ Potential issue

Add comprehensive error handling and remove unused parameter

The asynthesize method needs several improvements:

  1. Remove unused additional_source_nodes parameter
  2. Add error handling for text summarization and LLM calls
  3. Add type validation for query

Apply these changes:

    async def asynthesize(
        self,
        query: QueryTextType,
        nodes: List[NodeWithScore],
-       additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
        **response_kwargs: Any,
    ) -> RESPONSE_TYPE:
        image_nodes, text_nodes = _get_image_and_text_nodes(nodes)

+       if not nodes:
+           raise ValueError("No nodes provided for synthesis")

+       try:
            # Summarize the text nodes to avoid exceeding the token limit
            text_response = str(await super().asynthesize(query, nodes))
+       except Exception as e:
+           raise RuntimeError(f"Failed to summarize text nodes: {str(e)}") from e

+       if not hasattr(query, 'query_str'):
+           raise ValueError("Query must have 'query_str' attribute")
        fmt_prompt = self._text_qa_template.format(
            context_str=text_response,
-           query_str=query.query_str,  # type: ignore
+           query_str=query.query_str,
        )

+       try:
            llm_response = await self._multi_modal_llm.acomplete(
                prompt=fmt_prompt,
                image_documents=[
                    image_node.node
                    for image_node in image_nodes
                    if isinstance(image_node.node, ImageNode)
                ],
            )
+       except Exception as e:
+           raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e

        return Response(
            response=str(llm_response),
            source_nodes=nodes,
            metadata={"text_nodes": text_nodes, "image_nodes": image_nodes},
        )

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

26-29: Enhance class docstring with more details

The current docstring could be more informative. Consider adding parameter descriptions, usage examples, and return type information.

 class MultiModalSynthesizer(BaseSynthesizer):
     """
-    A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response.
+    A synthesizer that processes both text and image nodes to generate multi-modal responses.
+    
+    Args:
+        multimodal_model (MultiModalLLM): The multi-modal language model for processing
+        response_synthesizer (Optional[BaseSynthesizer]): Synthesizer for text node processing
+        text_qa_template (Optional[BasePromptTemplate]): Template for formatting QA prompts
+    
+    Example:
+        >>> synthesizer = MultiModalSynthesizer(
+        ...     multimodal_model=multi_modal_llm,
+        ...     response_synthesizer=get_response_synthesizer()
+        ... )
+        >>> response = await synthesizer.asynthesize(query, nodes)
     """
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b9d336c and 3171d73.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (4)
templates/components/engines/python/agent/tools/query_engine.py (4)

44-50: 🛠️ Refactor suggestion

Remove unused parameter

The additional_source_nodes parameter is never used in the implementation.

     async def asynthesize(
         self,
         query: QueryTextType,
         nodes: List[NodeWithScore],
-        additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
         **response_kwargs: Any,
     ) -> RESPONSE_TYPE:

Likely invalid or redundant comment.


63-70: ⚠️ Potential issue

Add error handling for LLM operations

The multi-modal LLM call should include error handling.

+        try:
             llm_response = await self._multi_modal_llm.acomplete(
                 prompt=fmt_prompt,
                 image_documents=[
                     image_node.node
                     for image_node in image_nodes
                     if isinstance(image_node.node, ImageNode)
                 ],
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e

Likely invalid or redundant comment.


53-61: ⚠️ Potential issue

Add error handling and type validation

The text summarization and prompt formatting need proper error handling and type validation.

         # Summarize the text nodes to avoid exceeding the token limit
+        try:
             text_response = str(
                 await self._response_synthesizer.asynthesize(query, text_nodes)
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to summarize text nodes: {str(e)}") from e

+        if not hasattr(query, 'query_str'):
+            raise ValueError("Query must have 'query_str' attribute")
         fmt_prompt = self._text_qa_template.format(
             context_str=text_response,
-            query_str=query.query_str,  # type: ignore
+            query_str=query.query_str,
         )

Likely invalid or redundant comment.


93-102: 🛠️ Refactor suggestion

Improve multi-modal query engine creation

The current implementation has several limitations:

  1. Multi-modal support is restricted to LlamaCloudIndex
  2. Not using MultiModalSynthesizer for response synthesis
     if index.__class__.__name__ == "LlamaCloudIndex":
         retrieval_mode = kwargs.get("retrieval_mode")
         if retrieval_mode is None:
             kwargs["retrieval_mode"] = "auto_routed"
-            if get_multi_modal_llm():
-                kwargs["retrieve_image_nodes"] = True
-                return RetrieverQueryEngine(
-                    retriever=index.as_retriever(**kwargs),
-                    response_synthesizer=get_response_synthesizer(),
-                )
+            multi_modal_llm = get_multi_modal_llm()
+            if multi_modal_llm:
+                kwargs["retrieve_image_nodes"] = True
+                return RetrieverQueryEngine(
+                    retriever=index.as_retriever(**kwargs),
+                    response_synthesizer=MultiModalSynthesizer(
+                        multimodal_model=multi_modal_llm,
+                        response_synthesizer=get_response_synthesizer()
+                    ),
+                )

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

26-29: Enhance class docstring with more details

The current docstring could be more informative. Consider adding:

  • Parameters description
  • Return value description
  • Usage example
 class MultiModalSynthesizer(BaseSynthesizer):
     """
-    A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response.
+    A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response.
+    
+    Args:
+        multimodal_model (MultiModalLLM): The multi-modal language model to use
+        response_synthesizer (Optional[BaseSynthesizer]): Synthesizer for text responses
+        text_qa_template (Optional[BasePromptTemplate]): Template for text QA
+    
+    Returns:
+        RESPONSE_TYPE: A response containing both text and image analysis
+    
+    Example:
+        >>> synthesizer = MultiModalSynthesizer(
+        ...     multimodal_model=get_multi_modal_llm(),
+        ...     response_synthesizer=get_response_synthesizer()
+        ... )
     """
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3171d73 and 235159e.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (3)
templates/components/engines/python/agent/tools/query_engine.py (3)

31-43: ⚠️ Potential issue

Add parameter validation for required arguments

The constructor should validate required parameters and properly handle optional ones.

     def __init__(
         self,
         multimodal_model: MultiModalLLM,
-        response_synthesizer: Optional[BaseSynthesizer],
+        response_synthesizer: Optional[BaseSynthesizer] = None,
         text_qa_template: Optional[BasePromptTemplate] = None,
         *args,
         **kwargs,
     ):
         super().__init__(*args, **kwargs)
+        if multimodal_model is None:
+            raise ValueError("multimodal_model is required")
+        if response_synthesizer is None:
+            raise ValueError("response_synthesizer is required")
         self._multi_modal_llm = multimodal_model
         self._response_synthesizer = response_synthesizer
         self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL

44-76: ⚠️ Potential issue

Add comprehensive error handling for async operations

The method needs proper error handling for async operations and text processing.

     async def asynthesize(
         self,
         query: QueryTextType,
         nodes: List[NodeWithScore],
-        additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
         **response_kwargs: Any,
     ) -> RESPONSE_TYPE:
         image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
 
+        try:
             text_response = str(
                 await self._response_synthesizer.asynthesize(query, text_nodes)
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to synthesize text response: {str(e)}") from e
 
         fmt_prompt = self._text_qa_template.format(
             context_str=text_response,
             query_str=query.query_str,  # type: ignore
         )
 
+        try:
             llm_response = await self._multi_modal_llm.acomplete(
                 prompt=fmt_prompt,
                 image_documents=[
                     image_node.node
                     for image_node in image_nodes
                     if isinstance(image_node.node, ImageNode)
                 ],
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e

93-106: 🛠️ Refactor suggestion

Improve multi-modal support flexibility

The current implementation restricts multi-modal support to LlamaCloudIndex. Consider making it available for other index types when multi-modal LLM is available.

-    if index.__class__.__name__ == "LlamaCloudIndex":
-        retrieval_mode = kwargs.get("retrieval_mode")
-        if retrieval_mode is None:
-            kwargs["retrieval_mode"] = "auto_routed"
-            if get_multi_modal_llm():
-                kwargs["retrieve_image_nodes"] = True
-                return RetrieverQueryEngine(
-                    retriever=index.as_retriever(**kwargs),
-                    response_synthesizer=MultiModalSynthesizer(
-                        multimodal_model=get_multi_modal_llm(),
-                        response_synthesizer=get_response_synthesizer(),
-                    ),
-                )
+    # Set auto-routed mode for LlamaCloudIndex
+    if index.__class__.__name__ == "LlamaCloudIndex" and kwargs.get("retrieval_mode") is None:
+        kwargs["retrieval_mode"] = "auto_routed"
+
+    # Enable multi-modal support if available
+    multi_modal_llm = get_multi_modal_llm()
+    if multi_modal_llm:
+        kwargs["retrieve_image_nodes"] = True
+        return RetrieverQueryEngine(
+            retriever=index.as_retriever(**kwargs),
+            response_synthesizer=MultiModalSynthesizer(
+                multimodal_model=multi_modal_llm,
+                response_synthesizer=get_response_synthesizer(),
+            ),
+        )

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

82-85: Enhance class documentation

The class documentation could be more descriptive about its functionality and requirements.

Consider expanding the docstring:

 class MultiModalSynthesizer(BaseSynthesizer):
     """
-    A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response.
+    A synthesizer that processes both text and image nodes to generate multi-modal responses.
+    
+    This synthesizer:
+    1. Separates text and image nodes from the input
+    2. Summarizes text nodes using a response synthesizer
+    3. Combines the summarized text with images using a multi-modal LLM
+    
+    Args:
+        multimodal_model: The multi-modal LLM to use for generating responses
+        response_synthesizer: The synthesizer to use for text node summarization
+        text_qa_template: Optional template for formatting the text query
     """
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 235159e and 2e8e89a.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

87-99: ⚠️ Potential issue

Improve constructor parameter validation and initialization

The constructor needs validation for required parameters and proper initialization of templates.

Apply these changes:

     def __init__(
         self,
         multimodal_model: MultiModalLLM,
         response_synthesizer: Optional[BaseSynthesizer],
         text_qa_template: Optional[BasePromptTemplate] = None,
         *args,
         **kwargs,
     ):
         super().__init__(*args, **kwargs)
+        if multimodal_model is None:
+            raise ValueError("multimodal_model is required")
+        if response_synthesizer is None:
+            raise ValueError("response_synthesizer is required")
         self._multi_modal_llm = multimodal_model
         self._response_synthesizer = response_synthesizer
-        self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL
+        self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL.get_prompt()

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

81-188: Consider extracting common logic to reduce duplication

The synthesize and asynthesize methods share significant logic. Consider extracting the common parts into private helper methods.

+    def _prepare_prompt(
+        self,
+        query: QueryTextType,
+        text_response: str,
+    ) -> str:
+        if not hasattr(query, 'query_str'):
+            raise ValueError("Query must have 'query_str' attribute")
+        return self._text_qa_template.format(
+            context_str=text_response,
+            query_str=query.query_str,
+        )
+
+    def _prepare_image_documents(
+        self,
+        image_nodes: List[NodeWithScore],
+    ) -> List[ImageNode]:
+        return [
+            image_node.node
+            for image_node in image_nodes
+            if isinstance(image_node.node, ImageNode)
+        ]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2e8e89a and 46096ba.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (3)
templates/components/engines/python/agent/tools/query_engine.py (3)

155-188: ⚠️ Potential issue

Apply the same improvements to the synchronous synthesize method

The synchronous method has the same issues as its asynchronous counterpart.

Apply the same improvements as suggested for the asynthesize method:

  1. Remove unused parameter
  2. Add error handling
  3. Fix type checking

86-98: ⚠️ Potential issue

Add parameter validation and fix template initialization

The constructor needs validation for the required multimodal_model parameter and proper initialization of the text QA template.

     def __init__(
         self,
         multimodal_model: MultiModalLLM,
         response_synthesizer: Optional[BaseSynthesizer] = None,
         text_qa_template: Optional[BasePromptTemplate] = None,
         *args,
         **kwargs,
     ):
         super().__init__(*args, **kwargs)
+        if multimodal_model is None:
+            raise ValueError("multimodal_model is required")
         self._multi_modal_llm = multimodal_model
         self._response_synthesizer = response_synthesizer or get_response_synthesizer()
-        self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL
+        self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL.get_prompt()

Likely invalid or redundant comment.


118-153: ⚠️ Potential issue

Improve error handling and remove unused parameter in asynthesize

The method needs several improvements:

  1. Remove unused additional_source_nodes parameter
  2. Add error handling for text summarization and LLM calls
  3. Fix type checking for query.query_str
     async def asynthesize(
         self,
         query: QueryTextType,
         nodes: List[NodeWithScore],
-        additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
         **response_kwargs: Any,
     ) -> RESPONSE_TYPE:
+        if not nodes:
+            raise ValueError("No nodes provided for synthesis")
+
         image_nodes, text_nodes = _get_image_and_text_nodes(nodes)

         if len(image_nodes) == 0:
             return await self._response_synthesizer.asynthesize(query, text_nodes)

-        # Summarize the text nodes to avoid exceeding the token limit
-        text_response = str(
-            await self._response_synthesizer.asynthesize(query, text_nodes)
-        )
+        try:
+            text_response = str(
+                await self._response_synthesizer.asynthesize(query, text_nodes)
+            )
+            if not hasattr(query, 'query_str'):
+                raise ValueError("Query must have 'query_str' attribute")
+            fmt_prompt = self._text_qa_template.format(
+                context_str=text_response,
+                query_str=query.query_str,
+            )
+            llm_response = await self._multi_modal_llm.acomplete(
+                prompt=fmt_prompt,
+                image_documents=[
+                    image_node.node
+                    for image_node in image_nodes
+                    if isinstance(image_node.node, ImageNode)
+                ],
+            )
+        except Exception as e:
+            raise RuntimeError(f"Failed to generate response: {str(e)}") from e

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

35-47: Simplify multi-modal configuration logic

The current implementation has nested conditions that make the code harder to follow. Consider separating the concerns:

     multimodal_llm = get_multi_modal_llm()
+    # Configure multi-modal support
     if multimodal_llm:
         kwargs["response_synthesizer"] = MultiModalSynthesizer(
             multimodal_model=multimodal_llm,
         )
+        kwargs["retrieve_image_nodes"] = True

+    # Configure LlamaCloudIndex settings
     if index.__class__.__name__ == "LlamaCloudIndex":
-        if kwargs.get("retrieval_mode") is None:
-            kwargs["retrieval_mode"] = "auto_routed"
-        if multimodal_llm:
-            kwargs["retrieve_image_nodes"] = True
+        kwargs.setdefault("retrieval_mode", "auto_routed")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 46096ba and a7f56a7.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)

133-136: ⚠️ Potential issue

Add type validation for query attributes

The type ignore comment suggests potential type issues that should be handled explicitly:

+        if not hasattr(query, 'query_str'):
+            raise ValueError("Query must have 'query_str' attribute")
         fmt_prompt = self._text_qa_template.format(
             context_str=text_response,
-            query_str=query.query_str,  # type: ignore
+            query_str=query.query_str,
         )

Likely invalid or redundant comment.


138-145: ⚠️ Potential issue

Add error handling for LLM operations

The LLM call could fail for various reasons (API errors, rate limits, etc.):

+        try:
             llm_response = await self._multi_modal_llm.acomplete(
                 prompt=fmt_prompt,
                 image_documents=[
                     image_node.node
                     for image_node in image_nodes
                     if isinstance(image_node.node, ImageNode)
                 ],
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a7f56a7 and af02674.

📒 Files selected for processing (1)
  • templates/components/engines/python/agent/tools/query_engine.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)

Pattern templates/**: For files under the templates folder, do not report 'Missing Dependencies Detected' errors.

🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)

134-137: 🛠️ Refactor suggestion

Add type checking for query.query_str access

Replace the type ignore comment with proper type checking:

+        if not hasattr(query, 'query_str'):
+            raise ValueError("Query must have 'query_str' attribute")
         fmt_prompt = self._text_qa_template.format(
             context_str=text_response,
-            query_str=query.query_str,  # type: ignore
+            query_str=query.query_str,
         )

Likely invalid or redundant comment.


139-146: 🛠️ Refactor suggestion

Add error handling for LLM operations

Add proper error handling for LLM operations:

+        try:
             llm_response = await self._multi_modal_llm.acomplete(
                 prompt=fmt_prompt,
                 image_documents=[
                     image_node.node
                     for image_node in image_nodes
                     if isinstance(image_node.node, ImageNode)
                 ],
             )
+        except Exception as e:
+            raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e

Likely invalid or redundant comment.

@marcusschiesser marcusschiesser merged commit f9a057d into main Nov 29, 2024
46 checks passed
@marcusschiesser marcusschiesser deleted the feat/multimodal branch November 29, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants